Datasets

In Basalt, a dataset is a structured collection of examples you use to evaluate and improve your AI systems. Instead of testing prompts or models with ad‑hoc inputs, you store representative examples—inputs, expected outputs, and metadata—in datasets and reuse them across experiments, environments, and teams. Typical uses include:

Building evaluation suites for prompts, RAG pipelines, and agents
Capturing high‑quality production examples for regression testing
Sharing canonical test cases between teams and environments

This page focuses on the concept of datasets: what they are, when to use them, and how they fit into your evaluation workflow.
For concrete SDK usage, see the Python pages in this section. TypeScript v1 docs are not available yet (use the v0 archive).

What datasets contain

A dataset usually describes a specific evaluation surface—for example “customer support Q&A” or “RAG search quality”. Each dataset is made of:

Columns – the schema describing what each row contains (e.g. question, context, category)
Rows – individual test cases with:
- Values: the actual inputs or fields (e.g. a user question, retrieved context)
- Ideal output: the expected or “golden” answer, when available
- Metadata: additional context like difficulty, source, tags, or ratings

This structure lets you evaluate different prompts, models, or workflows against the same underlying data, so you can compare results fairly.

Why use datasets

Datasets help you:

Ensure consistency – run the same tests across versions, models, and environments
Improve reproducibility – reproduce issues and fixes using stable, versioned examples
Streamline experimentation – plug datasets into evaluators and experiments instead of hand‑crafting inputs
Track progress over time – measure quality improvements on a fixed benchmark
Organize real‑world examples – turn production traffic into structured, reusable test cases

How datasets fit with other Basalt features

Datasets work best alongside:

Prompts – use dataset rows as inputs to prompts and compare actual vs. ideal outputs
Evaluators – automate scoring of model behavior on dataset rows
Experiments – run A/B tests or model comparisons over a given dataset

The SDK pages in this section show how to:

List and inspect datasets for your workspace
Retrieve full datasets (columns and rows) for testing
Add new rows programmatically from scripts, experiments, or production systems

Start Here

Prompts

Datasets

Experiments

Observability

Learn more

Datasets

Datasets

What datasets contain

Why use datasets

How datasets fit with other Basalt features

Start Here

Prompts

Datasets

Experiments

Observability

Learn more

​Datasets

​What datasets contain

​Why use datasets

​How datasets fit with other Basalt features

Datasets

What datasets contain

Why use datasets

How datasets fit with other Basalt features