Getting Started
Integrate Freeplay into your application in less than 10 minutes
Getting Started with Freeplay
Welcome to Freeplay! This guide will help you set up Freeplay and start analyzing your LLM interactions in minutes.
Prerequisites
Before diving in, make sure you have:
1. A Freeplay Account
- New users: Sign up at app.freeplay.ai
- Some Enterprise users: Access your instance at
<subdomain>.freeplay.ai
Have set up questions? Contact us at [email protected] and we'll help you get set up.
Create Your First Project
Projects in Freeplay organize your prompts and LLM interactions. Think of a project as a container for all the prompts that power a specific product or set of features in your application.
Step 1: Navigate to Projects
From your dashboard, click the New Project button in the top right corner.

Step 2: Configure Your Project
- Project Name: Choose something descriptive (e.g., "Customer Support Bot", "Product Description Generator")
- Visibility:
- Private: Only you and invited team members can access
- Public: All organization members can view and contribute
Best Practice: Create separate projects per product. For example, keep your "Email Generator" separate from your "Code Review Assistant" for better organization and cleaner analytics.
Integrate Your Project
Install the Freeplay SDK
Freeplay offers native SDKs for Python, Node.js, and Java (for use with any JVM language). Don't see an SDK you need? Please reach out at [email protected].
Install the SDK using the commands below:
pip install freeplay
npm install freeplay
<!-- Add the Freeplay SDK to your pom.xml -->
<dependency>
<groupId>ai.freeplay</groupId>
<artifactId>client</artifactId>
<version>x.x.xx</version>
</dependency>
3 ways to integrate
There are 3 ways to get started in Freeplay
- Freeplay prompt management - set up prompts in the Freeplay UI and download them to your application. (Recommended)
- Manage prompts in code - store prompts in code and sync to Freeplay (Flexible)
- Lightweight Observability (Fastest)
Freeplay works best when the structure of your prompts are reflected in the platform. Prompts provide the structure for building datasets, writing targeted evaluations, and running experiments. Therefore, options 1 & 2 will unlock the greatest number of features up front, while option 3 allows you to get data flowing into the system quickly—though you'll need to add prompt structure later to realize the platform's full potential.
Regardless of how you get started you can decide later how you want to do prompt management in the longer term.
Here’s more detail on each getting started option
Option 1: Freeplay prompt management
Step 1: Create the prompt template
To create your first prompt template in the UI go to Prompts in the main menu and click “Create Prompt Template”.
The prompt editor will help you This will pop open a prompt editor and you can draft your first prompt.
Freeplay leverages mustache syntax to provide a templating structure for writing prompts.
(more details on the components of prompt templates here)
In this case I’ve created a prompt with a single input variable: artist

[Optional] Step 2: Hook into observability
If you already have an application you can start to hook in Freeplay for prompt management and observability. If not feel free to skip this step!
The basic steps of a Freeplay integration look like this.
Fetch the prompt from Freeplay.
formatted_prompt = fpClient.prompts.get_formatted(
project_id=project_id,
template_name="album-bot",
environment="latest",
variables={"artist": "Taylor Swift"}
)
Utilizing the prompt as a helpful data object, call your LLM provider directly.
chat_response = openai_client.chat.completions.create(
model=formatted_prompt.prompt_info.model,
messages=formatted_prompt.llm_prompt,
**formatted_prompt.prompt_info.model_parameters
)
And finally record the interaction back to Freeplay.
from freeplay import RecordPayload
session = fp_client.sessions.create()
payload = RecordPayload(
project_id=project_id
all_messages=all_messages,
inputs=prompt_vars,
session_info=session,
prompt_version_info=formatted_prompt.prompt_info,
call_info=CallInfo.from_prompt_info(formatted_prompt.prompt_info, start_time=start, end_time=end, usage=UsageTokens(chat_response.usage.prompt_tokens, chat_response.usage.completion_tokens)),
)
# record the LLM interaction
fpClient.recordings.create(payload)
From there you’ll be able to see data flowing into Freeplay in the Observability tab.
For full integration details see our SDK documentation.
Option 2: Sync prompts to Freeplay from code
Step 1: Push your prompt to Freeplay programmatically
First push your prompt to Freeplay with the following SDK method
curl -X POST "https://api.freeplay.ai/api/v2/projects/<project-id>/prompt-templates/name/<template-name>/versions" \
-H "Authorization: Bearer <YOUR_API_KEY>" \
-H "Content-Type: application/json" \
-d '{
"template_messages": [
{
"role": "system",
"content": "some content here with mustache {{variable}} syntax"
}
],
"provider": "openai",
"model": "gpt-4.1",
"llm_parameters": {
"temperature": 0.2,
"max_tokens": 256
},
"version_name": "dev-version",
"version_description": "Development test version with mustache variable"
}'
This will create a new prompt template in the system. You can view it by going to the Prompt section in your project.

[Optional] Step 2: Hook into Observability
If you already have an application you can start to hook in Freeplay for prompt management and observability. If not feel free to skip this step!
If you want to switch over to Freeplay for prompt management follow the observability steps above!
If you’d prefer to manage your prompts in code then continue to build your prompts how you have been, but after each LLM call add a record call back to Freeplay.
fp_client.recordings.create(
RecordPayload(
project_id=project_id,
all_messages=messages,
inputs={'keyA': 'valueA'},
prompt_info=PromptInfo(
prompt_template_id=template_id,
prompt_template_version_id=new_version_id
)
)
)
Option 3: Lightweight Observability
If you don’t want to move your prompts into Freeplay just yet, you can still start to get data flowing into the system.
Step 1: Hook onto Observability
Leave your LLM application just as it is but after each LLM interaction make a record call to Freeplay.
# Continue with you application as is
prompt = [{"role": "system", "content": "generate an album name for Taylor Swift"}]
chat_response = openai_client.chat.completions.create(
model="gpt-4.1",
messages=prompt
)
# Add a record call to Freeplay
fp_client.recordings.create(
RecordPayload(
project_id={{Add your Proje}},
all_messages=messages,
inputs={'artist': 'Taylor Swift'}, # Optional,
call_info=CallInfo(provider='openai', model='gpt-4.1', # Optional
)
)
The minimum you need to record to Freeplay are the message and a project id. However the more information you provide the more useful the observability data will be. See the full set of options here In this case we’ve added two additional fields
- Inputs: by specifying which parts of the prompt are dynamic inputs this helps add structure for the next step
- Call Info: specifying the model and the provider
Go to the Observability tab in the Freeplay UI to view your recorded data!
[Optional] Step 2: Convert the logged data to a prompt template
While this step is technically optional, it is necessary to set up the proper structure for the next section- running your first test.
Find one of your logs in the Observability tab and open it.
You’ll see a message in the UI to convert the session to a prompt template in Freeplay.

Click “Save to template” to open the prompt editor with the logged completion loaded in. We can even re-run it from right here in the editor.

We will now want to turn the messages into an templated prompt. In this situation Taylor Swift is a dynamic value that is going to change each time we invoke the prompt so we are going to replace the hard coded text with a dynamic variable called artist.

Now instead of that prompt being hardcoded to Taylor Swift, we’ve turned it into a template such that we can pass in any artist (like Justin Beiber). Once you’ve properly created a prompt template you’ll hit save and then you’ll want to update your observability record calls to link to that prompt template id.
fp_client.recordings.create(
RecordPayload(
project_id=project_id,
all_messages=messages,
inputs={'artist': 'Taylor Swift'}, # Optional,
call_info=CallInfo(provider='openai', model='gpt-4.1', # Optional,
prompt_info=PromptInfo(
prompt_template_id='9b3441d8-f07d-4f0c-cc687-f04b21f92496',
prompt_template_version_id='8a3441d8-f07d-4f0c-b72d-f04b21f92496'
)
)
)
Running Your First Test
The power of Freeplay really shines when you have a repeatable iteration loop. Tests are a key part of that loop. Let’s set up our first test.
Step 1 Create a dataset
Datasets in Freeplay give you a repeatable collection of inputs to test prompt changes against.
To create a new dataset navigate to the Datasets tab and click “Create Dataset” in the top right.

You’ll give your dataset a name and description as well as decide which prompt(s) you want this dataset to be compatible with.
Once we’ve create a dataset there are a number of ways to populate it with examples including
- Save from observability
- File upload
- Programatic upload
- Create in the UI
Step 2: Create an Eval
Evaluations in Freeplay give you a mechanism to score the quality of your LLM systems. Freeplay’s evaluation offerng is extensive and flexible (see this guide for full details).
But for now we will create a single LLM-as-Judge evaluator by going to the Evaluations tab and clicking “New Evaluation” in the top right.
Select a target and the type of evaluation you want to create.

Give your evaluation a name and description and decide what scoring scale you want to use.

Freeplay’s agent will draft an evaluation for you, but it’s fully customizable from here. Once you’re happy with it hit Save.

Step 3: Run a Test
Test Runs in Freeplay give you a repeatable way to quantify changes to your LLM system. Tests can be executed via the UI or via the SDK.
To run from the UI go to Tests and click “New Test”. Select what prompt version and dataset you want to test.

Test will break down a prompt performance by cost, latency and evaluation scores.

Tests become even more poweful when comparing multiple versions of a prompt. To add a comparison click “Add Comparison”. Use this feature to compare how different models, or updates perform for your test cases.

For any test you can also dive into the row level details by navigating over to the Test Cases tab within the test.

Now you have the insight to make quantifiable deployment decision in a repeatable way!
Updated about 22 hours ago