Inference endpoints
Default synchronous endpoint
When you deploy an agent on Blaxel, an inference endpoint is automatically generated on Global Agentics Network. This endpoint operates synchronously—keeping the connection open until your agent sends its complete response. This endpoint supports both batch and streaming responses, which you can implement in your agent’s code. The inference URL looks like this:Query agent
- The synchronous endpoint has a timeout of 100 seconds for keeping the connection open when no data flows through the API. If your agent streams back responses, the 100-second timeout resets with each chunk streamed. For example, if your agent processes a request for 5 minutes while streaming data, the connection stays open. However, if it goes 100 seconds without sending any data — even while calling external APIs — the connection will timeout.
- If your request processing is expected to take longer than 100 second without streaming data, you should use the asynchronous endpoint or batch jobs instead.
Async endpoint
In addition to the default synchronous endpoint, Blaxel provides the ability to create asynchronous endpoints for handling longer-running agent requests.
Query agent (async)
blaxel.toml
file.
blaxel.toml reference
blaxel.toml reference
This file is used to configure the deployment of the agent on Blaxel. The only mandatory parameter is the
type
so Blaxel knows which kind of entity to deploy. Others are not mandatory but allow you to customize the deployment.name
,workspace
, andtype
fields are optional and serve as default values. Any bl command run in the folder will use these defaults rather than prompting you for input.agents
,functions
, andmodels
fields are also optional. They specify which resources to deploy with the agent. These resources are preloaded during build, eliminating runtime dependencies on the Blaxel control plane and dramatically improving performance.[env]
section defines environment variables that the agent can access via the SDK. Note that these are NOT secrets.[runtime]
section allows to override agent deployment parameters: timeout (in s) or memory (in MB) to allocate.[[triggers]]
and[triggers.configuration]
sections defines ways to send requests to the agent. You can create both synchronous and asynchronous trigger endpoints (respectivelytype = "http"
ortype = "http-async"
). You can also make them either private (default) or public (authenticationType = "public"
). A private synchronous HTTP endpoint is always created by default, even if you don’t define any trigger here.
Endpoint authentication
By default, agents deployed on Blaxel aren’t public. It is necessary to authenticate all inference requests, via a bearer token. The evaluation of authentication/authorization for inference requests is managed by the Global Agentics Network based on the access given in your workspace.See how to remove authentication on a deployed agent down below.
Manage sessions
To simulate multi-turn conversations, you can pass on request headers. You’ll need your client to generate this ID and pass it using any header which you can retrieve via the code (e.g.Thread-Id
). Without a thread ID, the agent won’t maintain nor use any conversation memory when processing the request.
Make an agent public
To make an agent publicly accessible, add the following to theblaxel.toml
configuration file, as explained above:
blaxel.toml