Skip to main content
Use this page as the customer-facing reference for harmstack flags.
This page intentionally shows canonical kebab-case flags only (for example --job-id, --benchmark-id, --unit-count).

Global and shared flags

These flags apply to the harmstack root command and harmstack init.

Authentication and endpoint targeting

--harmstack-api-key
string
Your Harmstack API key for account and job APIs.Env fallback: HARMSTACK_API_KEYExample:
harmstack init --harmstack-api-key "$HARMSTACK_API_KEY"
--target-model-endpoint
string
URL of the model API endpoint to benchmark.Example:
--target-model-endpoint https://api.openai.com/v1/responses
--target-model-api-key
string
Bearer token for your target model endpoint. Required when using --consentandskip.Env fallback: TARGET_MODEL_API_KEYExample:
--target-model-api-key "$TARGET_MODEL_API_KEY"
--provider
string
default:"openai"
API shape for your target endpoint.Accepted values:
  • openai
  • openai_responses
  • gemini
  • raw
Example:
--provider openai_responses
--model
string
Model name used in requests. Ignored when --provider=raw.Example:
--model gpt-4o-mini

Benchmark selection and run behavior

--benchmark-id
integer[]
Benchmark IDs to run. Repeat the flag or pass a comma-separated list.Examples:
--benchmark-id 2 --benchmark-id 3
--benchmark-id=2,3
--unit-count
integer[]
Number of human-annotated unit tests per benchmark job (1 to 10). Must align with --benchmark-id order and length.Example:
--benchmark-id 2 --unit-count 5 --benchmark-id 3 --unit-count 3
--consentandskip
boolean
default:"false"
Skip interactive prompts and run non-interactively. Recommended for CI and scripting.Example:
harmstack --haystack --consentandskip --provider openai_responses --benchmark-id 2 --unit-count 1
--header
string[]
Optional HTTP headers added to every request to your model endpoint. Repeat as needed.Example:
--header "X-Custom-Header: value" --header "X-Trace-Id: run-123"
--haystack
boolean
default:"false"
Run the Haystack benchmarking flow directly from the root harmstack command.Example:
harmstack --haystack --consentandskip --provider openai --benchmark-id 2 --unit-count 1

harmstack compare-jobs flags

--job-a
string
UUID of the first job.Example:
harmstack compare-jobs --job-a=550e8400-e29b-41d4-a716-446655440000 --job-b=6ba7b810-9dad-11d1-80b4-00c04fd430c8
--job-b
string
UUID of the second job.Example:
harmstack compare-jobs --job-a=550e8400-e29b-41d4-a716-446655440000 --job-b=6ba7b810-9dad-11d1-80b4-00c04fd430c8

harmstack list-jobs flags

--format
string
default:"table"
Output format.Accepted values: table, csvExample:
harmstack list-jobs --format csv --limit 20 --status completed
--limit
integer
default:"10"
Maximum jobs to return.Example:
harmstack list-jobs --limit 25
--status
string
default:"completed"
Status filter.Accepted values: completed, failed, allExample:
harmstack list-jobs --status all

harmstack show-job flags

--job-id
string
UUID of the job to inspect.Examples:
harmstack show-job --job-id=550e8400-e29b-41d4-a716-446655440000
harmstack show-job 550e8400-e29b-41d4-a716-446655440000

harmstack stats flags

--limit
integer
default:"30"
Number of recent completed jobs to include in aggregate calculations.Example:
harmstack stats --limit 50
--since
string
Date filter in YYYY-MM-DD format.Example:
harmstack stats --since 2025-01-01