harmstack flags.
Global and shared flags
These flags apply to theharmstack root command and harmstack init.
Authentication and endpoint targeting
Your Harmstack API key for account and job APIs.Env fallback:
HARMSTACK_API_KEYExample:URL of the model API endpoint to benchmark.Example:
Bearer token for your target model endpoint. Required when using
--consentandskip.Env fallback: TARGET_MODEL_API_KEYExample:API shape for your target endpoint.Accepted values:
openaiopenai_responsesgeminiraw
Model name used in requests. Ignored when
--provider=raw.Example:Benchmark selection and run behavior
Benchmark IDs to run. Repeat the flag or pass a comma-separated list.Examples:
Number of human-annotated unit tests per benchmark job (1 to 10). Must align with
--benchmark-id order and length.Example:Skip interactive prompts and run non-interactively. Recommended for CI and scripting.Example:
Optional HTTP headers added to every request to your model endpoint. Repeat as needed.Example:
Run the Haystack benchmarking flow directly from the root
harmstack command.Example:harmstack compare-jobs flags
UUID of the first job.Example:
UUID of the second job.Example:
harmstack list-jobs flags
Output format.Accepted values:
table, csvExample:Maximum jobs to return.Example:
Status filter.Accepted values:
completed, failed, allExample:harmstack show-job flags
UUID of the job to inspect.Examples:
harmstack stats flags
Number of recent completed jobs to include in aggregate calculations.Example:
Date filter in
YYYY-MM-DD format.Example: