Datasets are versioned collections of test cases that you use to run evaluations and track improvements over time. Build datasets from production logs, user feedback, manual curation, or generate them with Loop. Key advantages:Documentation Index
Fetch the complete documentation index at: https://braintrust.dev/docs/llms.txt
Use this file to discover all available pages before exploring further.
- Versioned: Every change is tracked, so experiments can pin to specific versions.
- Integrated: Use directly in evaluations and populate from production.
- Scalable: Stored in a modern data warehouse without storage limits.
Dataset structure
Each record has four top-level fields:- input: Data to recreate the example in your application (required).
- expected: Ideal output or ground truth (optional but recommended for evaluation).
- metadata: Key-value pairs for filtering and grouping (optional).
- tags: Labels for organizing and filtering records (optional).
Where to go from here
- Create datasets from uploads, the SDK, production logs, user feedback, traces, or Loop.
- Manage datasets — tag and star, save snapshots, define schemas, customize table views, and edit records.
- Use in evaluations by passing datasets to
Eval(), assigning them to environments, or converting experiment results. - Track performance to see which experiments used a dataset and how each row performs.