Overview

AI Insights can be toggled off in Project Settings > AI Features.

Freeplay’s AI Insights is an intelligence layer that sits at each transition point in your agent development workflow, helping you move from raw data to decisions. A background AI agent analyzes your labeled data — evaluation results, human annotations, and test runs — to generate findings. These findings help you understand the why behind performance changes and what to do about it. The goal is simple: every time you start spending time in Freeplay, you should quickly be able to spot the next most impactful thing you can do to improve your AI system. Then after every labeling session or experiment you run, you should quickly know what to do next.

The problem Insights solve

You can quickly log hundreds of thousands or millions of traces. You can run LLM judges or other metrics to score those logs, and visualize those in dashboards showing pass/fail rates. Those might tell you that you have a problem, but scores rarely tell you how to fix anything. Freeplay helps you track evaluation performance over time but raw metrics alone stop short of helping you understand why performance changed or what to do about it. When you try to decide what to actually improve, you’re stuck asking the same questions: Why is that metric failing? What should I do to fix it? Where should I start first? AI Insights closes that gap by proactively generating findings that add a layer of interpretation on top of your raw metrics — turning scores into direction. See our blog post for more information.

How Insights work

Insights come from Freeplay’s own AI agent that analyzes your data from multiple sources and generates actionable findings.

Where Insights run

Insights run in three places across the Freeplay platform, each representing a decision point in the AI quality workflow:

Location	Helps you answer
Production logs	Where should I focus? What’s broken that I didn’t know about?
Human reviews	What patterns are emerging across my team’s annotations? What are the root causes of issues people have seen?
Test runs	Did things get better or worse with this latest version, and how? What’s still broken after my latest changes?

Each of these represents a moment where you need to interpret lots of data and decide what to do next — exactly the kind of work AI is good at.

A single completion or trace can be tagged with more than one insight.

Types of Insights

Freeplay generates three types of Insights, each tied to a different data source and decision point in your workflow:

Evaluation Insights - Analyze production logs scored by LLM-as-a-judge evaluations. Run on a weekly cadence to surface systemic issues across your logged data.
Review Insights - Analyze human annotations in real time. Every note, label, or evaluation triggers the agent to identify patterns and group them into themes.
Test Run Insights - Analyze test run results to identify failure patterns across test cases and recommend where to focus next.

What data insights use

Each type of insight is based on different types of information within Freeplay. Here are the sources for each insight:

	Evaluation Insights	Review Insights	Test Run Insights
Model-graded evaluations	:check	:check	:check
Human evaluations	X	:check	X
Human notes and lables	X	:check	X
Logged data	:check	:check	:check

AI Insights does not currently use code evaluations or auto-categorizations as input sources.

Viewing and using Insights

Insights tab showing generated findings with severity levels and linked traces

AI Insights are viewable from the Home page or the Insights tab within your project.

Refining Insights

You can fine-tune insights to improve their accuracy and usefulness:

Update the name — providing a more descriptive name can slightly adjust and refine the grouping of tagged records
Edit the description — adding more details, specific errors, or patterns found in your analysis helps fine-tune what the insight captures

Resolving Insights

The goal of insights is to resolve them and surface new ones. Insights can help lead your team towards solving the key issues in your product. Once an issue is identified and fixed, the insight will start to lose traction as no new information is added to it.

Insights and the data flywheel

Insights provide a clean path towards understanding the why behind errors. Some outcomes of insights include:

Prompt improvements — actionable suggestions for how to modify your prompts
New evaluations — generating new LLM-as-a-judge evals based on discovered patterns
Deeper investigation — surfacing issues that people might be missing

Combined with Review Queues, prompt optimization, and automated evaluations, Insights help your data flywheel operate smoothly.

AI Features Overview — Overview of all AI-powered features in Freeplay
Model-Graded Evaluations — Configure the LLM judges that feed into Insights
Review Queues — Set up human review workflows that generate Review Insights
Automated Insights for AI Agents — Blog post with more detail on the vision behind Insights

Getting Started

Account Setup

Core Concepts

How-To Guides

Developer Resources

Security & Compliance

Resources

The problem Insights solve

How Insights work

Where Insights run

Types of Insights

What data insights use

Viewing and using Insights

Refining Insights

Resolving Insights

Insights and the data flywheel

Getting Started

Account Setup

Core Concepts

How-To Guides

Developer Resources

Security & Compliance

Resources

​The problem Insights solve

​How Insights work

​Where Insights run

​Types of Insights

​What data insights use

​Viewing and using Insights

​Refining Insights

​Resolving Insights

​Insights and the data flywheel

​Related resources

The problem Insights solve

How Insights work

Where Insights run

Types of Insights

What data insights use

Viewing and using Insights

Refining Insights

Resolving Insights

Insights and the data flywheel

Related resources