Agent deployments on Blaxel have a default inference endpoint which can be used by external consumers to request an inference execution. This inference endpoint is synchronous so the connection remains open until the end of your request is entirely processed by the agent. You can also query an asynchronous endpoint for agents, allowing to send requests that last for longer times without keeping connections open.

All inference requests are routed on the Global Agentics Network based on the deployment policies associated with your agent deployment.

Inference endpoints

Default synchronous endpoint

When you deploy an agent on Blaxel, an inference endpoint is automatically generated on Global Agentics Network. This endpoint operates synchronously—keeping the connection open until your agent sends its complete response. This endpoint supports both batch and streaming responses, which you can implement in your agent’s code.

The inference URL looks like this:

Query agent
POST https://run.blaxel.ai/{YOUR-WORKSPACE}/agents/{YOUR-AGENT}

Async endpoint

In addition to the default synchronous endpoint, Blaxel provides the ability to create asynchronous endpoints for handling longer-running agent requests.

This endpoint allows you to initiate requests without maintaining an open connection throughout the entire processing duration, making it particularly useful for complex or time-intensive operations that might exceed typical connection timeouts. Blaxel handles queuing and execution behind the scene. You are responsible for implementing your own method for retrieving the agent’s results in your code. You can send results to a webhook, a database, an S3 bucket, etc.

The timeout duration for this endpoint is 15 minutes. If your request processing is expected to take longer than this, you should use batch jobs instead.

The async endpoint looks like this:

Query agent (async)
POST https://run.blaxel.ai/{YOUR-WORKSPACE}/agents/{YOUR-AGENT}/async

You can create async endpoints either from the Blaxel Console, or from your code in the blaxel.toml file.

Endpoint authentication

By default, agents deployed on Blaxel aren’t public. It is necessary to authenticate all inference requests, via a bearer token.

The evaluation of authentication/authorization for inference requests is managed by the Global Agentics Network based on the access given in your workspace.

See how to remove authentication on a deployed agent down below.

Manage sessions

To simulate multi-turn conversations, you can pass on request headers. You’ll need your client to generate this ID and pass it using any header which you can retrieve via the code (e.g. Thread-Id). Without a thread ID, the agent won’t maintain nor use any conversation memory when processing the request.

Make an agent public

To make an agent publicly accessible, add the following to the blaxel.toml configuration file, as explained above:

blaxel.toml
…
[[triggers]]
id = "http"
type = "http"

[triggers.configuration]
path = "/<PATH>" # This will be translated to https://run.blaxel.ai/<YOUR_WORKSPACE>/<PATH>
authenticationType = "public"

Make an inference request

Blaxel API

Make a POST request to the default inference endpoint for the agent deployment you are requesting, making sure to fill in the authentication token:

curl 'https://run.blaxel.ai/YOUR-WORKSPACE/agents/YOUR-AGENT?environment=YOUR-ENVIRONMENT' \
  -H 'accept: application/json, text/plain, */*' \
  -H 'X-Blaxel-Authorization: Bearer YOUR-TOKEN' \
  -H 'X-Blaxel-Workspace: YOUR-WORKSPACE' \
  --data-raw $'{"inputs":"Enter your input here."}'

Read about the API parameters in the reference.

Blaxel CLI

The following command will make a default POST request to the agent.

bl run agent your-agent --data '{"inputs":"Enter your input here."}'

Read about the CLI parameters in the reference.

Blaxel console

Inference requests can be made from the Blaxel console from the agent deployment’s Playground page.