Skip to main content
Harmstack helps teams evaluate medical AI quality with repeatable, structured benchmarks. Point Harmstack at your model endpoint, run a benchmark, and review results through the harmstack CLI or API.

Quick Start

Download the CLI binary and run your first benchmark in minutes.

CLI Reference

Explore every command and flag in the harmstack CLI.

API Authentication

Learn how to authenticate API requests with your API key.

API Endpoints

Browse the full public REST API reference.

How it works

1

Get your API key

Obtain your HARMSTACK_API_KEY from Vetted Medical. This key authenticates all CLI commands and API requests.
2

Install the CLI

Download a prebuilt binary from our public releases repo: vettedmedical/harmstack-install-v0 (latest release)Releases include binaries for macOS (Apple Silicon), Windows, and Linux.
3

Configure your model endpoint

Set your model endpoint and credentials in the wizard, or export defaults so you do not need to re-enter them each time:
export TARGET_MODEL_ENDPOINT_URL="https://api.openai.com/v1/responses"
export TARGET_MODEL_API_KEY="sk-proj-1234567"
Harmstack supports multiple provider shapes, including openai, openai_responses, gemini, and raw.
4

Run and review benchmarks

Choose a benchmark, submit a run, and inspect job-level scores and aggregate stats from the CLI or API.

What you can do

  • Submit benchmarks - Run one-off or repeated evaluations against your model endpoint
  • Inspect results - Review scores, metadata, and run details for each job
  • Compare runs - Measure quality changes across model or prompt revisions
  • Track trends - Monitor benchmark performance over time
  • Automate in CI/CD - Run non-interactively with --consentandskip
Admin-only API endpoints are intentionally excluded from this documentation. If you need admin access, contact Vetted Medical support.
Start with Quick Start for binary installation and a complete non-interactive command example.