Test Runs Overview
Test Runs provide structured testing for your AI systems, enabling you to validate performance across datasets and catch regressions before they reach production. Freeplay supports two complementary testing approaches designed for different stages of your development workflow.Testing Approaches
End-to-End Test Runs

Component-Level Test Runs

Core Concepts
A Test Run evaluates your LLM pipeline against a dataset to measure performance. Each run processes your test cases through the pipeline, applies evaluations, and provides both aggregate and row-level insights. The foundation of any test is your dataset—a curated collection of scenarios that represent important use cases, edge conditions, and known failure modes. Test Runs integrate with Freeplay’s evaluation system, applying model-graded evals, code-based checks, and human review to assess quality. You can compare results across different versions, models, or time periods to track improvements and catch regressions.When to Use Each Approach
The choice between end-to-end and component testing depends on what you’re trying to validate. Use end-to-end testing when deploying to production, making system architecture changes, or validating agent behavior with tool usage. These comprehensive tests ensure no system-wide regressions slip through. Component testing shines during prompt development, model selection, and quick validation checks. The UI-based testing makes it accessible to product managers and domain experts who need to review outputs without writing code.Getting Started
For developers, begin by installing the Freeplay SDK and creating datasets from your production data. Set up both end-to-end and component tests as part of your development workflow, integrating them into your CI/CD pipeline for automated validation. Product teams can jump straight into the Freeplay UI to create test datasets from important use cases and run component tests on prompt changes. The visual interface makes it easy to review results and provide feedback without technical expertise.What’s Next Now that you’re armed with the ability to test your models, let’s move onto Datasets. Ask AI

