Experiments

Experiments enable you to track A/B tests, model comparisons, and feature variations in your AI applications. They provide a structured way to compare different approaches and analyze their performance through observability traces.

This section covers the concept of experiments and how they are used for evaluation and comparison. For SDK usage, see the Python pages in this section. TypeScript v1 docs are not available yet (use the v0 archive).

What are Experiments?

Experiments in Basalt are a mechanism for:

Tracking different variants of prompts, models, or approaches
Comparing performance across versions
Associating traces with specific experiments
Attaching experiment metadata to observability spans
Running systematic tests and comparisons

Core Concepts

Each experiment is defined by:

ID: A unique identifier (e.g., "exp-456").
Name: A human-readable description (e.g., "Model Comparison A/B Test").
Feature Slug: An optional identifier for grouping experiments by feature.

When you run an experiment, you attach this metadata to your traces, allowing you to filter and analyze the performance of that specific variant later.

Why use Experiments?

Experiments are essential for:

A/B Testing: Serve different prompts or models to different users and track which one performs better.
Model Comparison: Evaluate how a new model version compares to the current production model.
Feature Flagging: Roll out new features to a subset of users and monitor the impact.

Start Here

Prompts

Datasets

Experiments

Observability

Learn more

Experiments

Experiments

What are Experiments?

Core Concepts

Why use Experiments?

Start Here

Prompts

Datasets

Experiments

Observability

Learn more

​Experiments

​What are Experiments?

​Core Concepts

​Why use Experiments?

Experiments

What are Experiments?

Core Concepts

Why use Experiments?