HuggingFace integration

The HuggingFace integration enables Blaxel users to connect to serverless endpoints from HuggingFace—whether public, gated, or private—directly through their agents on Blaxel. The integration is bidirectional, letting you create new deployments on HuggingFace from the Blaxel console to use as model APIs.

The integration must be set up by an admin in the Integrations section in the workspace settings.

Set up the integration

In order to use this integration, you must register a HuggingFace access token into your Blaxel workspace settings. The scope of this access token (i.e. the HuggingFace resources it is allowed to access) will be the scope that Blaxel has access to.

First, generate a HuggingFace access token from your HuggingFace settings. Give this access token the scope that you want Blaxel to access on HuggingFace (e.g. repositories, etc.).

On Blaxel, in the workspace settings, in the HuggingFace integration, paste this token into the “API key” section.

Connect to a HuggingFace model

Once you’ve set up the integration in the workspace, any workspace member can use it to reference a HuggingFace model as an external model API.

Public and private models

When creating a model API on Blaxel, select “HuggingFace”. You can search for:

any public model from Inference API (serverless)
any private model from Inference Endpoints (dedicated) in the organizations & repositories allowed by the integration’s token.

After the model API is created, you will receive a dedicated global Blaxel endpoint to call the model. Blaxel will forward inference requests to HuggingFace, using your HuggingFace credentials for authentication and authorization.

Gated models

If the model you’re trying to connected to is gated, you’ll first need to request access on HuggingFace, and accept their terms and conditions of usage (if applicable). Access to some HuggingFace models is granted immediately after request, while others require manual approval.

When the model gets deployed, Blaxel will check if the integration token is allowed access to the model on HuggingFace. If you have not been allowed access, the model deployment will fail in error.

Create a HuggingFace Inference Endpoint

You can deploy a model in HuggingFace’s Inference Endpoints directly from the Blaxel console when creating a new external model API.

Organization: select the HuggingFace namespace in which the endpoint will be deployed
Model: select the model to deploy
Instance: choose the type (GPU) and size of the instance to use for the deployment. Blaxel will trigger a deployment on Google Cloud Platform with default auto-scaling parameters.
Endpoint: enter the name for your endpoint on HuggingFace

This action will incur costs on your HuggingFace subscription, depending on the choice of instance selected.

Once you launch a deployment, it will be available in your HuggingFace console, as well as your Blaxel console. You will receive a dedicated global Blaxel endpoint to call the model which proxies the requests to the HuggingFace endpoint and enforces token usage control and observability.

On this page

Set up the integration
Connect to a HuggingFace model
Public and private models
Gated models
Create a HuggingFace Inference Endpoint

The integration must be set up by an admin in the Integrations section in the workspace settings.

Set up the integration

First, generate a HuggingFace access token from your HuggingFace settings. Give this access token the scope that you want Blaxel to access on HuggingFace (e.g. repositories, etc.).

On Blaxel, in the workspace settings, in the HuggingFace integration, paste this token into the “API key” section.

Connect to a HuggingFace model

Once you’ve set up the integration in the workspace, any workspace member can use it to reference a HuggingFace model as an external model API.

Public and private models

When creating a model API on Blaxel, select “HuggingFace”. You can search for:

any public model from Inference API (serverless)
any private model from Inference Endpoints (dedicated) in the organizations & repositories allowed by the integration’s token.

Gated models

Create a HuggingFace Inference Endpoint

You can deploy a model in HuggingFace’s Inference Endpoints directly from the Blaxel console when creating a new external model API.

Organization: select the HuggingFace namespace in which the endpoint will be deployed
Model: select the model to deploy
Instance: choose the type (GPU) and size of the instance to use for the deployment. Blaxel will trigger a deployment on Google Cloud Platform with default auto-scaling parameters.
Endpoint: enter the name for your endpoint on HuggingFace

This action will incur costs on your HuggingFace subscription, depending on the choice of instance selected.

On this page

Set up the integration
Connect to a HuggingFace model
Public and private models
Gated models
Create a HuggingFace Inference Endpoint

Set up the integration

Connect to a HuggingFace model

Public and private models

Gated models

Create a HuggingFace Inference Endpoint

Overview

Authentication API

Developer API

Inference API

Management API

Tools

HuggingFace integration

Set up the integration

Connect to a HuggingFace model

Public and private models

Gated models

Create a HuggingFace Inference Endpoint

​Set up the integration

​Connect to a HuggingFace model

​Public and private models

​Gated models

​Create a HuggingFace Inference Endpoint

Overview

Authentication API

Developer API

Inference API

Management API

Tools

​Set up the integration

​Connect to a HuggingFace model

​Public and private models

​Gated models

​Create a HuggingFace Inference Endpoint

Set up the integration

Connect to a HuggingFace model

Public and private models

Gated models

Create a HuggingFace Inference Endpoint

Set up the integration

Connect to a HuggingFace model

Public and private models

Gated models

Create a HuggingFace Inference Endpoint