Skip to main content
AI Insights can be toggled off in Project Settings > AI Features.
Freeplay’s AI Insights is an intelligence layer that sits at each transition point in your agent development workflow, helping you move from raw data to decisions. A background AI agent analyzes your labeled data — evaluation results, human annotations, and test runs — to generate findings. These findings help you understand the why behind performance changes and what to do about it. The goal is simple: every time you start spending time in Freeplay, you should quickly be able to spot the next most impactful thing you can do to improve your AI system. Then after every labeling session or experiment you run, you should quickly know what to do next.

The problem Insights solve

You can quickly log hundreds of thousands or millions of traces. You can run LLM judges or other metrics to score those logs, and visualize those in dashboards showing pass/fail rates. Those might tell you that you have a problem, but scores rarely tell you how to fix anything. Freeplay helps you track evaluation performance over time but raw metrics alone stop short of helping you understand why performance changed or what to do about it. When you try to decide what to actually improve, you’re stuck asking the same questions: Why is that metric failing? What should I do to fix it? Where should I start first? AI Insights closes that gap by proactively generating findings that add a layer of interpretation on top of your raw metrics — turning scores into direction. See our blog post for more information.

How Insights work

Insights come from Freeplay’s own AI agent that analyzes your data from multiple sources and generates actionable findings.

Where Insights run

Insights run in three places across the Freeplay platform, each representing a decision point in the AI quality workflow:
LocationHelps you answer
Production logsWhere should I focus? What’s broken that I didn’t know about?
Human reviewsWhat patterns are emerging across my team’s annotations? What are the root causes of issues people have seen?
Test runsDid things get better or worse with this latest version, and how? What’s still broken after my latest changes?
Each of these represents a moment where you need to interpret lots of data and decide what to do next — exactly the kind of work AI is good at.
A single completion or trace can be tagged with more than one insight.

Types of Insights

Freeplay generates three types of Insights, each tied to a different data source and decision point in your workflow:
  • Evaluation Insights - Analyze production logs scored by LLM-as-a-judge evaluations. Run on a weekly cadence to surface systemic issues across your logged data.
  • Review Insights - Analyze human annotations in real time. Every note, label, or evaluation triggers the agent to identify patterns and group them into themes.
  • Test Run Insights - Analyze test run results to identify failure patterns across test cases and recommend where to focus next.

What data insights use

Each type of insight is based on different types of information within Freeplay. Here are the sources for each insight:
Evaluation InsightsReview InsightsTest Run Insights
Model-graded evaluations:check:check:check
Human evaluationsX:checkX
Human notes and lablesX:checkX
Logged data:check:check:check
AI Insights does not currently use code evaluations or auto-categorizations as input sources.

Viewing and using Insights

Insights tab showing generated findings with severity levels and linked traces
AI Insights are viewable from the Home page or the Insights tab within your project.

Refining Insights

You can fine-tune insights to improve their accuracy and usefulness:
  • Update the name — providing a more descriptive name can slightly adjust and refine the grouping of tagged records
  • Edit the description — adding more details, specific errors, or patterns found in your analysis helps fine-tune what the insight captures

Resolving Insights

The goal of insights is to resolve them and surface new ones. Insights can help lead your team towards solving the key issues in your product. Once an issue is identified and fixed, the insight will start to lose traction as no new information is added to it.

Insights and the data flywheel

Insights provide a clean path towards understanding the why behind errors. Some outcomes of insights include:
  • Prompt improvements — actionable suggestions for how to modify your prompts
  • New evaluations — generating new LLM-as-a-judge evals based on discovered patterns
  • Deeper investigation — surfacing issues that people might be missing
Combined with Review Queues, prompt optimization, and automated evaluations, Insights help your data flywheel operate smoothly.