harmstack
The root command. Without flags it prints help. Pass--haystack to run the Haystack benchmarking module directly without invoking a subcommand.
--haystack, --target-model-endpoint, --provider, --benchmark-id, --unit-count, --consentandskip
harmstack init
Launch the interactive wizard to create a new benchmarking job. The wizard prompts for your Harmstack API key (or readsHARMSTACK_API_KEY), verifies your account, and guides you through selecting a benchmark and configuring your model endpoint.
Pass --consentandskip with the required flags to skip all prompts and run non-interactively.
--target-model-endpoint, --target-model-api-key, --provider, --model, --benchmark-id, --unit-count, --consentandskip, --header
harmstack credits
Show the available credit balance on your account.HARMSTACK_API_KEY to be set (or pass --harmstack-api-key).
harmstack list-jobs
List your most recent benchmarking jobs with scores. Output includes job ID, passed/failed counts, score percentage, and benchmark count.| Flag | Default | Description |
|---|---|---|
--limit | 10 | Maximum number of jobs to return |
--status | completed | Filter by status: completed, failed, or all |
--format | table | Output format: table or csv |
harmstack show-job
Show metadata and scoring stats for a single job by its UUID.--job-id
harmstack compare-jobs
Compare two jobs side by side. Displays passed, failed, score, and total benchmark counts for each job in a single table.--job-a, --job-b
harmstack stats
Show aggregate statistics across your most recent completed jobs: total job count, average score, and the best and worst performing jobs.| Flag | Default | Description |
|---|---|---|
--limit | 30 | Number of recent completed jobs to include |
--since | — | Restrict to jobs on or after this date (YYYY-MM-DD) |