Bearer YOUR_API_KEYSubmit a job
GET /v0/jobs/{id} to check progress and retrieve results.
Request body
The ID of the benchmark to run. Retrieve available benchmark IDs from
GET /v0/benchmarks.The URL of your model’s chat completions endpoint (e.g.
https://your-model.example.com/v1/chat/completions).The API key used to authenticate requests to your model endpoint.
The API shape of your model endpoint. Accepted values:
openai, openai_responses, gemini, raw. Defaults to openai.The model name or identifier (e.g.
"gpt-4o", "claude-3-5-sonnet"). Optional — used for logging.Additional HTTP headers to include with every request to your model endpoint. Provide as key-value pairs.
Number of benchmark units to run. Must be between
1 and 10. Each unit costs one credit.Optional random seed for reproducible unit sampling.
Example request
Response
Returns202 Accepted.
UUID of the created job. Use this to poll for results.
Initial status of the job. Always
"pending" on creation.A human-readable message with the polling URL.
Submitting a job immediately deducts credits equal to
benchmark_count from
your account balance. Ensure you have sufficient credits before submitting —
check your balance with GET /v0/me.Submit a batch of jobs
Request body
The URL of your model’s chat completions endpoint.
The API key used to authenticate requests to your model endpoint.
The API shape of your model endpoint. Accepted values:
openai, openai_responses, gemini, raw. Defaults to openai.The model name or identifier.
Additional HTTP headers to include with every request to your model endpoint.
Optional random seed for reproducible unit sampling across all jobs in the batch.
Array of job definitions. Each item specifies which benchmark to run and how many units.
Example request
Response
Returns202 Accepted.
UUID identifying the batch.
Array of UUIDs for each created job. Poll
GET /v0/jobs/{id} for each to retrieve results.Initial status of all jobs in the batch. Always
"pending" on creation.A human-readable message describing how to poll for results.
List jobs
Query parameters
Filter jobs by status. Accepted values:
pending, running, completed, failed.Maximum number of jobs to return.
Example request
Response
Returns200 OK with a jobs array.
Array of job objects.
Get a job
Path parameters
The UUID of the job, returned from
POST /v0/jobs or POST /v0/jobs/batch.Example request
Response
Returns200 OK. Fields are the same as in the list response, with additional real-time progress fields available while the job is running.
UUID of the job.
Current job status:
pending, running, completed, or failed.The evaluation module used.
ID of the benchmark that was run.
Name of the benchmark that was run.
The model endpoint URL that was evaluated.
Number of annotated needle prompts used.
Number of non-annotated hay prompts used.
Total number of prompts sent to your model endpoint.
Credits deducted for this job.
ISO 8601 timestamp of when the job was created.
ISO 8601 timestamp of when the job completed.
null if still in progress.Number of evaluation units your model passed. Present when job is completed.
Number of evaluation units your model failed. Present when job is completed.
Your model’s score as a percentage (0–100). Present when job is completed.
Total number of units scored. Present when job is completed.
Number of prompts processed so far. Present while the job is running.
Total number of prompts to process. Present while the job is running.
A progress message from the runner. Present while the job is running.