Experiments
Experiments enable you to track A/B tests, model comparisons, and feature variations in your AI applications. They provide a structured way to compare different approaches and analyze their performance through observability traces.This section covers the concept of experiments and how they are used for evaluation and comparison. For SDK usage, see the Python pages in this section. TypeScript v1 docs are not available yet (use the v0 archive).
What are Experiments?
Experiments in Basalt are a mechanism for:- Tracking different variants of prompts, models, or approaches
- Comparing performance across versions
- Associating traces with specific experiments
- Attaching experiment metadata to observability spans
- Running systematic tests and comparisons
Core Concepts
Each experiment is defined by:- ID: A unique identifier (e.g.,
"exp-456"). - Name: A human-readable description (e.g.,
"Model Comparison A/B Test"). - Feature Slug: An optional identifier for grouping experiments by feature.
Why use Experiments?
Experiments are essential for:- A/B Testing: Serve different prompts or models to different users and track which one performs better.
- Model Comparison: Evaluate how a new model version compares to the current production model.
- Feature Flagging: Roll out new features to a subset of users and monitor the impact.