Skip to main content
Writing effective evaluation prompts can be challenging, especially for teams new to LLM-based quality assessment. Freeplay’s Eval Creation Assistant uses AI to help you draft better evals faster—whether you’re starting from scratch or adapting a template.

How it works

The Eval Creation Assistant helps in two ways: Create custom evals from scratch: Start with the basic question you want to answer about your AI’s output. The assistant will:
  1. Help you refine your evaluation question to be clear and measurable
  2. Suggest improvements to your eval structure
  3. Automatically draft a model-graded eval prompt tailored to your specific prompts and data
Adapt from templates: Choose from common evaluation templates like Answer Faithfulness (for RAG), Similarity, Toxicity, or Tone. The assistant will:
  1. Automatically customize the template to match your prompt structure
  2. Reference the correct input variables from your prompts
  3. Generate a ready-to-use eval prompt with one click
Because Freeplay knows your prompt structure and has access to real-world examples from your logs, the assistant can generate eval prompts that are specific to your context rather than generic templates.

Use cases

  • Getting started quickly: Teams new to evals can create their first evaluations without prior experience
  • Adopting best practices: Start with industry-standard eval patterns and customize them for your needs
  • Cross-functional collaboration: Product managers, analysts, and domain experts can contribute to eval creation without writing code

Using the assistant

  1. Navigate to your prompt template or agent
  2. Go to the Evaluations section
  3. Choose Create your own or select from the template library
  4. For custom evals: Enter your evaluation question and follow the AI’s suggestions
  5. For templates: Select a template and the AI will automatically adapt it to your prompt
  6. Test the generated eval against sample data
  7. Use the alignment flow to validate that the eval matches human judgment
Even when using templates, the AI adapts them to your specific prompt variables and data structure—so you get truly customized evals, not just generic prompts.
Best practice: If you’re new to writing evals or unsure where to start, use the Eval Creation Assistant’s “Create your own” option. Describe what you want to evaluate in plain language, and the AI will generate a custom eval prompt tailored to your specific prompts and use case.