Skip to main content
You can accomplish a lot in Freeplay without integrating any code. This guide walks you through the core workflows you can complete entirely in the UI.

What you can do

  • Create and iterate on prompts in the playground
  • Build test datasets manually or by uploading CSV/JSONL files
  • Set up evaluations including model-graded and auto-categorization
  • Run tests to compare prompt and model changes quantitatively
When you’re ready to monitor production traffic or create datasets from real user interactions, see the integration guide.

1. Create your first prompt

From your project, click Create prompt template to open the playground. Prompt editor

Select your model

Choose from available providers like OpenAI, Anthropic, or Google. If you’ve run out of credits, you’ll need to configure your own API keys.

Add messages and variables

Freeplay prompts use messages (static content) and variables (dynamic inputs using {{variable_name}} syntax). This separation enables:
  • Rapidly building datasets from test runs
  • Evaluations that reference specific input variables
  • Batch testing across different inputs

Message types

  • System message: Sets the AI’s behavior and personality. This is your base instruction that defines how the AI should act.
  • User message: Represents input from your end user. Use variables to make it dynamic: Create an album name for {{artist}}
  • Assistant message: Pre-filled AI responses that serve as few-shot examples.
  • History: A special message type that represents conversation history in multi-turn conversations.

Configure advanced settings

Beyond messages, you can fine-tune your prompt’s behavior. Set temperature, max tokens, and other model parameters. You can also add tools for function calling or enable structured outputs.

Test in the playground

Before saving, test your prompt by entering values for your variables and clicking Run. Each test run can be saved to start building your first dataset.

Save your prompt

Once you’re satisfied, click Save and name your prompt template. Add a version name and description to help your team understand what changed.

2. Create a dataset

Datasets power your evaluations and test runs. You can:
  • Create from test runs: Save results from playground testing
  • Upload data: Import CSV or JSONL files with test cases
  • Add manually: Create individual examples in the UI
Each dataset row can include expected outputs for reference-based evaluations. Learn more about datasets

3. Set up evaluations

Define how you’ll measure quality. Freeplay supports:
  • Model-graded evaluations: Use an LLM judge to score outputs
  • Auto-categorization: Automatically classify outputs against criteria
  • Code evaluations: Custom logic (requires integration)
  • Human labels: Manual review by team members
Learn more about evaluations

4. Run tests

Once you have a prompt, dataset, and evaluations configured:
  1. Navigate to your prompt template
  2. Click Run test
  3. Select your dataset and evaluators
  4. Compare results across versions
Tests give you quantitative data to decide which prompt and model combinations perform best. Learn more about test runs

Next steps