Unlike traditional sandbox providers, Blaxel Sandboxes automatically scale up and down at near-instant speeds, and persist forever in standby. As such, here are some recommended best practices so you can best make use of the platform features.Documentation Index
Fetch the complete documentation index at: https://docs.blaxel.ai/llms.txt
Use this file to discover all available pages before exploring further.
Use the sandbox lifecycle strategically
Treat sandboxes as persistent computers, rather than thinking of them as ephemeral runtimes that must be wiped clean after every interaction. Just as your laptop isn’t reformatted every time you close the lid, you shouldn’t destroy a sandbox’s state simply because a session ended. Blaxel Sandboxes are designed to maintain persistent storage and system state, allowing agents to retain context, shell history, and installed dependencies indefinitely. The most efficient way to manage lifecycle is to let the sandbox suspend automatically when idle, rather than explicitly destroying it. Auto-suspend happens automatically after 15s inactivity. Resuming a standby sandbox is orders of magnitude faster than cold-booting a new box and re-running setup scripts (likenpm install or apt-get). By relying on suspension, you ensure that when an agent returns (whether in ten minutes or ten weeks) the environment is restored exactly as it was left, enabling a seamless instant resume experience that saves both time and compute costs.
The definition of a session is at your discretion. It’s a tradeoff between instant resume times from standby mode (~25ms) and paying for the standby snapshot storage cost. As a rule of thumb, most customers keep sandboxes in standby for 7-60 days — see down below.
Choose the right storage
Choosing the right storage for your sandbox data is important, as it can impact your agent’s performance, reliability, and overall consistency. Blaxel offers three options:- Sandbox filesystem: The sandbox itelf has a stateful filesystem optimized for speed. Data stored here persists for as long as the sandbox exists in standby, but is erased when the sandbox is terminated.
- Volumes: Volumes provide longer-term, durable block storage that is decoupled from compute. Data stored on a volume survives even if the associated sandbox is terminated, and newly-created sandboxes can access the data by attaching to the same volume. However, relying solely on volumes for state management comes with a trade-off: you lose the “instant resume” benefit of the sandbox filesystem and will incur a cold-boot penalty when creating a new sandbox and attaching an existing volume to it.
- Agent Drive: Agent Drive behaves like a shared, distributed filesystem that can be mounted to multiple sandboxes or agents at any time. Drives are highly optimized for small and medium-sized file operations. Agent Drive uses a fine-grained replication strategy and it is also separately backed up. Drives scale automatically with no fixed capacity limits. Pre-provisioning or run-time resizing is not required.
| Sandbox Filesystem | Volume | Agent Drive | |
|---|---|---|---|
| Persistence | Until sandbox deletion | Survives sandbox deletion | Survives sandbox deletion |
| Durability | Low (lost on sandbox deletion or crash) | High (block storage, replicated) | High (distributed storage, replicated) |
| Access type | Single | Single | Multiple |
| Hot-attach | N/A | No | Yes |
| Interface | POSIX | POSIX | POSIX + S3 |
| Size | ~50% of sandbox memory | User-defined at creation, expandable | No size limit |
| Optimized for | Data that doesn’t require high durability | Data that requires durability with high performance | Shared data |
Automate cleanup with TTLs
While we advocate for treating sandboxes as perpetual computers, not every machine needs to last forever. In production, you will inevitably accumulate a long tail of sandboxes that will never be called upon again (e.g., completed tasks, abandoned user sessions). To prevent digital clutter and unnecessary storage usage, you need automated garbage collection. Instead of writing scripts to manually filter and delete sandboxes, use expiration parameters and lifecycle policies to automate when a sandbox should be retired. You can configure this based on idle duration (e.g., delete if it hasn’t been active for 7 days) or absolute maximum age (e.g., delete after 30 days).In quota tiers 0 and 1, a maximum TTL is enforced on all sandboxes. On quota tier 2 and above, no expiration policy is set by default.
expiresIn, that computes the number of seconds until a sandbox is terminated due to its TTL or lifecycle policy.
Reduce cold start times with a pre-warmed pool
For workloads with heavy images, you can maintain a pool of pre-warmed sandboxes to serve requests with near-zero latency. The pattern works as follows:- Keep a pool of sandboxes ready, with their data recorded in a database or cache.
- When a task arrives, assign one sandbox from the pool to that task and mark it as in-use.
- After assigning a sandbox, create a replacement in the background to keep the pool at your target size.
- When the task finishes, either release the sandbox back to the pool or delete it and let the background replenishment fill the gap.
Use idempotent creation for sandboxes and preview URLs
When creating sandboxes and preview URLs, usecreateIfNotExists() (TypeScript) / create_if_not_exists() (Python) instead of create().
These methods are idempotent and handle conflicts gracefully when the requested resource already exists. If it exists, they return it; if not, they create it.
You should use create() only if you explicitly want an error to be raised when a sandbox or preview URL with the specified name already exists.
Retry strategically in case of errors
Blaxel returns structured error responses in case of sandbox creation or connection failures. To handle these errors, we recommend the following strategy:-
First, check the origin of the error.
Look at
error.origin(or theX-Blaxel-Source header). If it is notplatform, the 404 came from the application itself (e.g. a missing file), not from Blaxel. You should not re-create in that case. -
Then, branch on
error.code:WORKLOAD_UNAVAILABLE: The sandbox record exists but no healthy replica is currently serving it. This is the “sandbox is down” signal. It’s markedretryable: true, so retry with exponential backoff (500ms to 30s, give up after ~60s). If retries keep failing, treat it as gone and re-create.WORKLOAD_NOT_FOUND: The sandbox record itself doesn’t exist in the platform (deleted, never created, or wrong name). Definitely re-create. Not retryable.ROUTE_NOT_FOUND: The URL is wrong. This is a URL construction bug, not a sandbox availability issue. Don’t re-create, fix the URL. The SDK’ssandbox.metadata.urlis the canonical source.
