Creating an Experiment
To create an experiment, use thecreateExperiment method:
feature-slug: The slug of the feature this experiment is associated withname: A descriptive name for the experiment (visible in the UI)
createExperiment method returns an experiment object with the following properties:
id: Unique identifier for the experimentname: The name you providedfeatureSlug: The feature slug this experiment is associated withcreatedAt: Timestamp when the experiment was created
Attaching Traces to an Experiment
Once you have created an experiment, you can attach traces to it by passing the experiment object to thecreateTrace method:
The feature slug of the experiment must match the feature slug of the trace. If they don’t match, the experiment will be ignored and the trace will go to regular monitoring instead.
Evaluation in Experiments
When you attach evaluators to traces in an experiment, they run with special behavior:When traces are attached to an experiment, evaluators will run on 100% of the traces regardless of the sampleRate setting. This ensures you get complete evaluation data for all experimental runs, which is essential for meaningful comparisons.
- Get evaluation scores for every single experimental run
- Compare evaluation metrics between different variants
- Ensure statistical significance in your evaluation results
- Draw reliable conclusions from your experiments
Benefits of Using Experiments for Testing
Using experiments for testing provides several advantages:- Centralized Results: All test runs are grouped together in a single experiment, making it easy to analyze the results.
- Workflow Validation: You can validate that your AI workflows produce the expected outputs for a variety of inputs.
- Quality Regression Detection: By running experiments regularly (e.g., on every PR or nightly), you can detect regressions in your AI workflows.
- Evaluation Integration: Automatic evaluators provide objective metrics about the quality of your AI outputs.
- Mock Data Support: You can run experiments with mock data or synthetic inputs to test specific scenarios.
- Historical Comparisons: Compare current performance against previous runs to ensure your changes improve quality.
Experiment Best Practices
To get the most out of experiments:- Run enough samples: Aim for at least 50-100 runs per variant to get statistically significant results.
- Control your variables: Change only one thing at a time between variants to isolate the impact of that change.
- Add detailed metadata: Include information that will help you analyze the results later, such as variant identifiers and relevant parameters.
- Include evaluators: Add evaluators to automatically assess the quality of outputs from different variants.
- Use the same feature slug: Make sure the experiment and all traces use the same feature slug.
- Add error handling: Ensure all traces are ended properly, even if errors occur.
- Name your variants: Consistently name your variants to make it easy to identify them in the results.