Eval Creation Assistant - Freeplay Introduction

How it works

The Eval Creation Assistant helps in two ways:

Create custom evals from scratch: Start with the basic question you want to answer about your AI’s output. The assistant will:

Help you refine your evaluation question to be clear and measurable

Suggest improvements to your eval structure

Automatically draft a model-graded eval prompt tailored to your specific prompts and data

Adapt from templates: Choose from common evaluation templates like Answer Faithfulness (for RAG), Similarity, Toxicity, or Tone. The assistant will:

Automatically customize the template to match your prompt structure

Reference the correct input variables from your prompts

Generate a ready-to-use eval prompt with one click

Because Freeplay knows your prompt structure and has access to real-world examples from your logs, the assistant can generate eval prompts that are specific to your context rather than generic templates.

Use cases

Getting started quickly: Teams new to evals can create their first evaluations without prior experience

Adopting best practices: Start with industry-standard eval patterns and customize them for your needs

Cross-functional collaboration: Product managers, analysts, and domain experts can contribute to eval creation without writing code

Using the assistant

Navigate to your prompt template or agent

Go to the Evaluations section

Choose Create your own or select from the template library

For custom evals: Enter your evaluation question and follow the AI’s suggestions

For templates: Select a template and the AI will automatically adapt it to your prompt

Test the generated eval against sample data

Use the alignment flow to validate that the eval matches human judgment

Even when using templates, the AI adapts them to your specific prompt variables and data structure—so you get truly customized evals, not just generic prompts.

Best practice: If you’re new to writing evals or unsure where to start, use the Eval Creation Assistant’s “Create your own” option. Describe what you want to evaluate in plain language, and the AI will generate a custom eval prompt tailored to your specific prompts and use case.

Documentation Index

​How it works

​Use cases

​Using the assistant

How it works

Use cases

Using the assistant