Introduction

Datasets in Freeplay are an essential part of organizing data to test your LLM systems. They can also be used to curate data for human review or fine-tuning. Datasets are the foundation of Test Runs in Freeplay.

A key benefit of using Freeplay to curate Datasets is that it's seamless to save new examples that you observe in real-world testing or production to existing Datasets. This keeps the data fresh and representative of the actual use of your application.

Datasets can be created to test LLM systems across a variety of scenarios, such as:

Golden Set: For maintaining a high quality bar and detecting regressions vs. your ideal ground truth
Failure Cases: For tracking failures you observe and testing in the future to confirm they are fixed
Red Teaming: For managing adversarial test cases and confirming appropriate behavior by your system
Random Samples: For representative testing across a distributed set of values

Instructions on how to save observed data or upload data are below.

Curating Datasets

Datasets in Freeplay can be curated in one of two ways: by saving completions that are recorded to Freeplay straight from the Sessions view, or by uploading existing test cases to a Dataset.

Saving Data from Recorded Sessions

While working with recorded Sessions or Traces in Freeplay, if you encounter values that are relevant for future testing, you can save it directly.

Click "Add to dataset"
Select the relevant dataset(s)
Optionally, click the + button to create a new dataset from this menu

Similarly from the trace view

Bulk Updates

You can also select multiple completions at once and add a large group of completions to a dataset at one time, even across pages.

Select the "Completions" view on Observability (instead of Sessions)
Click the radio buttons in the table for the rows you want

Adding Metadata to Dataset Entries

Metadata can now be added to entries in your datasets, allowing you to store additional information with each entry.

To add or edit metadata for a dataset entry:

Navigate to a specific dataset entry
Click the "Edit" option in the dropdown menu
In edit mode, you'll see a dedicated "Metadata" section at the top of the entry
Add customizable key-value pairs such as:
- Customer identifiers (e.g., "customerId": "2382721")
Click "Add Metadata" to create additional fields as needed
Click "Save" to store your changes

Uploading Datasets

Uploading Data Using JSONL

If you have existing data that is relevant to use for testing prompts in Freeplay, you can upload it directly as a JSONL file.

Navigate directly to the Dataset
Click the "Upload" button
Select a JSONL file that uses the following format. Be sure to append the filename with .jsonl
- The "inputs" are your test cases, and are therefore required. At least one key name must match a variable value from your prompt template in Freeplay for it to be compatible for testing.
- The "output" values are optional when uploading, but are essential for doing any evaluations that depend on assertions related to a ground truth value.
- Note that JSONL is NOT normal JSON. The syntax is the same, except each value must be flattened down to a single line. Normal JSON will not be accepted. (See https://jsonlines.org/)

{"inputs": {"tasks": "improve landing page"}, "output": "some good stuff"}
{"inputs": {"tasks": "do other stuff"}, "output": "some other stuff"}
{"inputs": {"tasks": "protect our website from bad actors"}, "output": "no more bad stuff"}

Uploading Data With CSV

Freeplay supports CSV uploads for datasets, so that you can easily upload your spreadsheets to use as datasets for testing and evaluation. This can be used to add data to a new or existing dataset.

Adding a Dataset With CSV

Click the Upload Button

On the dataset page, select the Upload button.
Download the CSV Template
In the bottom-left corner of the upload dialog, click Download CSV Template to get a CSV file with the correct column names for your dataset.

Format Your Data
Replace the default CSV values with your dataset content, ensuring that each entry aligns with your selected prompt template. Follow these key formatting rules:

Use the inputs. prefix for prompt variables
- Any variable referenced within a prompt must be prefixed with inputs. (e.g., inputs.name for a {{name}} variable).
- This ensures that Freeplay correctly maps your dataset to your prompt template.
- For more details on variable usage, see our Advanced Prompt Templating guide.
Add conversation history
- Use history to provide previous interactions or context relevant to the prompt.
Specify expected responses
- Use output to define the expected model response for each input, which is useful for evaluation and comparing LLM responses to a gold standard.

Before uploading your CSV file, ensure your dataset follows the required formatting rules to avoid import errors. If there are issues, Freeplay will flag them and block the upload. For example, if your CSV contains invalid inputs, Freeplay will display a warning and prevent the upload.

Dataset Compatibility

We've found that it's important to allow for relatively flexible compatibility rules to accommodate complex prompting strategies. The following compatibility rules may be important to know:

Compatibility for testing is based on the input {{variable_names}} in your prompt templates. These must match with the key names in your Datasets.
A Dataset is treated as compatible if one or more key names match for a given prompt template. This is important so that datasets can be treated as compatible even when some variable names are optional in practice. (See Advanced Prompt Templating Using Mustache)
Datasets can be used across multiple prompt templates in a Project, as long as at least one variable name is shared. For instance, if you have four prompt templates that all use the variable {{question}}, then any Dataset that contains values for {{question}} will be compatible.