Test Run Insights

When you’re iterating on an agent — updating prompts and tools, running benchmarks and regression tests — you eventually hit a wall. You’ve fixed the obvious issues, your scores have plateaued, and you’re not sure what to try next. Test Run Insights analyze every test case in a batch test run, compare results to ground truth and across multiple versions when relevant, and identify patterns across failures to recommend where to focus next.

How they work

After a test run completes, the Insights agent reviews the full set of results and looks for patterns across individual test cases. Rather than reviewing each failure one by one, the agent groups failures into themes and highlights the most impactful areas for improvement. The inputs to Test Run Insights include:

Test case results — pass/fail outcomes and scores across all samples in the run
Evaluation reasoning — the rationale behind each score from LLM judges or other evaluators
Version comparisons — when multiple versions are tested, differences in performance across versions

For each insight, you get a description of the failure pattern, links to the specific test cases that match, and a count of affected samples to help you prioritize.

Test Run Insights are currently available to select design partners. Reach out to your Freeplay contact to learn more.

AI Insights Overview — How Freeplay’s AI Insights work across the platform
Testing — Run batch evaluations against datasets

Getting Started

Account Setup

Core Concepts

How-To Guides

Developer Resources

Security & Compliance

Resources

How they work

Getting Started

Account Setup

Core Concepts

How-To Guides

Developer Resources

Security & Compliance

Resources

​How they work

​Related resources

How they work

Related resources