Serverless Environments

Serverless GPU inference for ML models.


Pay-per-millisecond API to run ML in production. Upload your model and instantly get an inference API endpoint.


Scaleable inference hosting for your machine learning models on serverless GPUs.


You can use Replicate to run machine learning models in the cloud from your own code, without having to set up any servers. Our community has published hundreds of open-source models that you can run, or you can run your own models.