Skip to main content

AI Insights

Surface patterns and root causes from your evaluation data, human reviews, and test runs

Model-Graded Evaluations

Score individual completions and traces using LLMs to evaluate your AI outputs at scale

Eval Creation Assistant

Create better evaluation criteria with AI-powered suggestions and prompt drafts for LLM judges

Auto-Categorization

Classify logs to reveal usage patterns and understand how users interact with your AI

Prompt Optimization

Get AI-generated suggestions for improved prompts based on your production data

Overview

All AI features in Freeplay work by calling LLM APIs to analyze your data. They are designed to work with different models and to use your API keys and model preferences, based on your account settings.

Managing AI feature settings

Disabling specific features

Individual AI features can be controlled through their respective configuration:
  • Model-graded evaluations: Disable per evaluation by turning off or setting sample rate to zero
  • Eval Creation Assistant: This is an on-demand feature that only runs when creating evals
  • Auto-categorization: Disable per auto-category by turning off or setting sample rate to zero
  • Prompt optimization: This is an on-demand feature that only runs when triggered
  • Review Insights: Runs automatically during review; disable via the Insights toggle in Project Settings > AI Features
  • Evaluation Insights: Runs weekly; disable via the Insights toggle in Project Settings > AI Features

Cost considerations

AI features consume tokens from the selected LLM provider. Costs depend on:
  • Which features you use and how frequently
  • The volume of data being analyzed
  • The models being used (more capable models typically cost more)
When Freeplay Keys are enabled, Freeplay covers the cost of AI feature usage. When using your own API keys, costs are billed directly to your provider account. Token usage for AI features is tracked separately from your application’s LLM usage and is visible in the Usage dashboard. If you’re using your own API keys, monitor this usage and consider adjusting feature sampling rates if costs are higher than expected.