Skip to main content

Documentation Index

Fetch the complete documentation index at: https://braintrust.dev/docs/llms.txt

Use this file to discover all available pages before exploring further.

Datasets are versioned collections of test cases that you use to run evaluations and track improvements over time. Build datasets from production logs, user feedback, manual curation, or generate them with Loop. Key advantages:
  • Versioned: Every change is tracked, so experiments can pin to specific versions.
  • Integrated: Use directly in evaluations and populate from production.
  • Scalable: Stored in a modern data warehouse without storage limits.

Dataset structure

Each record has four top-level fields:
  • input: Data to recreate the example in your application (required).
  • expected: Ideal output or ground truth (optional but recommended for evaluation).
  • metadata: Key-value pairs for filtering and grouping (optional).
  • tags: Labels for organizing and filtering records (optional).

Where to go from here

  • Create datasets from uploads, the SDK, production logs, user feedback, traces, or Loop.
  • Manage datasets — tag and star, save snapshots, define schemas, customize table views, and edit records.
  • Use in evaluations by passing datasets to Eval(), assigning them to environments, or converting experiment results.
  • Track performance to see which experiments used a dataset and how each row performs.
For human review workflows on dataset records, see Human review and Custom views.