Start in the UI

You can accomplish a lot in Freeplay without integrating any code. This guide walks you through the core workflows you can complete entirely in the UI.

What you can do

Create and iterate on prompts in the playground
Build test datasets manually or by uploading CSV/JSONL files
Set up evaluations including model-graded and auto-categorization
Run tests to compare prompt and model changes quantitatively

When you’re ready to monitor production traffic or create datasets from real user interactions, see the integration guide.

1. Create your first prompt

From your project, click Create prompt template to open the playground.

Select your model

Choose from available providers like OpenAI, Anthropic, or Google. If you’ve run out of credits, you’ll need to configure your own API keys.

Add messages and variables

Freeplay prompts use messages (static content) and variables (dynamic inputs using {{variable_name}} syntax). This separation enables:

Rapidly building datasets from test runs
Evaluations that reference specific input variables
Batch testing across different inputs

Message types

System message: Sets the AI’s behavior and personality. This is your base instruction that defines how the AI should act.
User message: Represents input from your end user. Use variables to make it dynamic: Create an album name for {{artist}}
Assistant message: Pre-filled AI responses that serve as few-shot examples.
History: A special message type that represents conversation history in multi-turn conversations.

Configure advanced settings

Beyond messages, you can fine-tune your prompt’s behavior. Set temperature, max tokens, and other model parameters. You can also add tools for function calling or enable structured outputs.

Test in the playground

Before saving, test your prompt by entering values for your variables and clicking Run. Each test run can be saved to start building your first dataset.

Save your prompt

Once you’re satisfied, click Save and name your prompt template. Add a version name and description to help your team understand what changed.

2. Create a dataset

Datasets power your evaluations and test runs. You can:

Create from test runs: Save results from playground testing
Upload data: Import CSV or JSONL files with test cases
Add manually: Create individual examples in the UI

Each dataset row can include expected outputs for reference-based evaluations. Learn more about datasets

3. Set up evaluations

Define how you’ll measure quality. Freeplay supports:

Model-graded evaluations: Use an LLM judge to score outputs
Auto-categorization: Automatically classify outputs against criteria
Code evaluations: Custom logic (requires integration)
Human labels: Manual review by team members

Learn more about evaluations

4. Run tests

Once you have a prompt, dataset, and evaluations configured:

Navigate to your prompt template
Click Run test
Select your dataset and evaluators
Compare results across versions

Tests give you quantitative data to decide which prompt and model combinations perform best. Learn more about test runs

What you can do

1. Create your first prompt

Select your model

Add messages and variables

Message types

Configure advanced settings

Test in the playground

Save your prompt

2. Create a dataset

3. Set up evaluations

4. Run tests

Next steps

Integrate with your app

Create evaluations

​What you can do

​1. Create your first prompt

​Select your model

​Add messages and variables

​Message types

​Configure advanced settings

​Test in the playground

​Save your prompt

​2. Create a dataset

​3. Set up evaluations

​4. Run tests

​Next steps

Integrate with your app

Create evaluations

What you can do

1. Create your first prompt

Select your model

Add messages and variables

Message types

Configure advanced settings

Test in the playground

Save your prompt

2. Create a dataset

3. Set up evaluations

4. Run tests

Next steps