HuggingFace integration
Deploy public or private AI models from HuggingFace.
The HuggingFace integration enables Blaxel users to connect to serverless endpoints from HuggingFace—whether public, gated, or private—directly through their agents on Blaxel. The integration is bidirectional, letting you create new deployments on HuggingFace from the Blaxel console to use as model APIs.
The integration must be set up by an admin in the Integrations section in the workspace settings.
Set up the integration
In order to use this integration, you must register a HuggingFace access token into your Blaxel workspace settings. The scope of this access token (i.e. the HuggingFace resources it is allowed to access) will be the scope that Blaxel has access to.
First, generate a HuggingFace access token from your HuggingFace settings. Give this access token the scope that you want Blaxel to access on HuggingFace (e.g. repositories, etc.).
On Blaxel, in the workspace settings, in the HuggingFace integration, paste this token into the “API key” section.
Connect to a HuggingFace model
Once you’ve set up the integration in the workspace, any workspace member can use it to reference a HuggingFace model as an external model API.
Public and private models
When creating a model API on Blaxel, select “HuggingFace”. You can search for:
- any public model from Inference API (serverless)
- any private model from Inference Endpoints (dedicated) in the organizations & repositories allowed by the integration’s token.
After the model API is created, you will receive a dedicated global Blaxel endpoint to call the model. Blaxel will forward inference requests to HuggingFace, using your HuggingFace credentials for authentication and authorization.
Gated models
If the model you’re trying to connected to is gated, you’ll first need to request access on HuggingFace, and accept their terms and conditions of usage (if applicable). Access to some HuggingFace models is granted immediately after request, while others require manual approval.
When the model gets deployed, Blaxel will check if the integration token is allowed access to the model on HuggingFace. If you have not been allowed access, the model deployment will fail in error.
Create a HuggingFace Inference Endpoint
You can deploy a model in HuggingFace’s Inference Endpoints directly from the Blaxel console when creating a new external model API.
- Organization: select the HuggingFace namespace in which the endpoint will be deployed
- Model: select the model to deploy
- Instance: choose the type (GPU) and size of the instance to use for the deployment. Blaxel will trigger a deployment on Google Cloud Platform with default auto-scaling parameters.
- Endpoint: enter the name for your endpoint on HuggingFace
Once you launch a deployment, it will be available in your HuggingFace console, as well as your Blaxel console. You will receive a dedicated global Blaxel endpoint to call the model which proxies the requests to the HuggingFace endpoint and enforces token usage control and observability.