Skip to main content
Evaluation Insights analyze your production log data to surface issues you might not catch from dashboards alone.
Image

How they work

Freeplay proactively analyzes logged data that has model-graded evaluations applied. The agent reviews these logs and identifies key patterns across the data. Here is the general flow:
  1. Freeplay collects evaluation results over a time period (requiring at least 10 logs with evaluation data)
  2. The AI analyzes the logs, looking for patterns in:
    • Poor-scoring outputs and their common characteristics
    • Correlation between different evaluation criteria
    • Input patterns that tend to produce poor results
  3. The agent then reviews these results, assigns, creates or updates existing insights to properly assign and group the data
For each insight, you get a clear description of the problem, an easy link to the underlying traces that back it up, and the number of matching records as a proxy for scale and impact to help you prioritize what matters most.

When they run

Evaluation Insights run on a weekly cadence, generating findings every Monday morning based on the previous week’s data.
Evaluation Insights can be disabled in Project Settings > AI Features.