Project
Project
A top-level workspace in Freeplay that contains all your prompt templates, sessions, datasets, evaluations, and configurations. Projects are identified by a unique
project_id and represent a distinct AI application or use case. All other entities in Freeplay (sessions, prompt templates, datasets, etc.) belong to a project.See Project Setup for configuration details.Observability
Observability
Observability in Freeplay refers to capturing and analyzing the behavior of your AI application through logged data.
Observability hierarchy
Freeplay uses a three-level hierarchy to organize your AI application logs. From highest to lowest level:SessionThe container for a complete user interaction, conversation, or agent run. Sessions group related traces and completions together. Examples include an entire chatbot conversation, a complete agent workflow, or a single user request that triggers multiple LLM calls.See Sessions, Traces, and Completions for more details.TraceAn optional grouping of related completions and tool calls within a session. Traces represent a functional unit of work, such as a single turn in a conversation, one run of an agent, or a logical step in a multi-step workflow. Traces can be nested to represent sub-agents or complex workflows.Traces can optionally be given a name (like “planning” or “tool_selection”) to represent a specific agent or workflow type. Named traces unlock additional Freeplay features: you can configure evaluation criteria to run against them, create linked datasets for testing, and group similar traces for analysis. See Agent below.See Traces and Record Traces for implementation details.CompletionThe atomic unit of observability in Freeplay. A completion represents a single LLM call, including the input prompt (messages) and the model’s response. Every completion is associated with a session and optionally a trace.See Recording Completions for implementation details.Other observability concepts
AgentIn Freeplay, an “agent” refers to a named category of traces that represent semantically similar workflows or behaviors. When you give traces the same name (e.g., “research_agent” or “customer_support”), Freeplay groups them together as an agent. This grouping enables you to:- Configure evaluation criteria that run automatically against traces with that name
- Create datasets linked to that agent for testing
- Analyze performance and quality across all traces of that type
- As part of completions (default): Tool calls appear in the message history—the tool call in the LLM’s output message, and the tool result as an input to the next LLM call. This is simpler and follows standard tool-calling patterns.
- As explicit tool spans: Create separate traces with
kind='tool'for granular visibility into tool execution timing and results as independent spans in the trace view.
custom_metadata fields when creating or updating observability objects. Don’t use custom metadata for user feedback like ratings or comments—use customer feedback instead.Customer feedbackEnd-user feedback recorded through dedicated feedback endpoints. Customer feedback includes ratings (thumbs up/down, star ratings) and freeform comments. This data receives special treatment in the Freeplay UI due to its distinct utility for quality improvement.Customer feedback is recorded via the /completion-feedback/ or /trace-feedback/ API endpoints.See Customer Feedback for implementation details.Prompt management
Prompt management
Prompt templateA versioned configuration that defines everything needed to make an LLM call: the message structure (using Mustache syntax for variables), provider and model selection, request parameters (like temperature), and optionally tool schemas or output structure definitions.Prompt templates separate the static structure of your prompts from the dynamic variables populated at runtime. This structure enables easy versioning, A/B testing, and dataset creation from production logs.See Managing Prompts for more details.EnvironmentA deployment target for prompt templates, such as
dev, staging, prod, or latest. Environments let you deploy different versions of your prompts to different stages of your application lifecycle, similar to feature flags.The latest environment always points to the most recently created version of a prompt template. Custom environments can be created for specific use cases.See Deployment Environments for configuration details.Prompt bundlingThe practice of snapshotting prompt template configurations into your source code repository rather than fetching them from Freeplay’s server at runtime. Prompt bundling removes Freeplay from the “hot path” of your application, improving latency and providing compliance benefits for regulated industries.See Prompt Bundling for implementation guidance.MustacheThe templating syntax used in Freeplay prompt templates for variable interpolation. Mustache uses double curly braces ({{variable_name}}) for simple substitution and supports conditional logic and iteration.See Advanced Prompt Templating Using Mustache for syntax reference.Evaluation and testing
Evaluation and testing
EvaluationThe process of measuring and scoring the quality of AI outputs. Freeplay supports four types of evaluations:See Datasets for more details.Test runA batch execution of evaluations against a dataset. Test runs can be:
- Human evaluation: Manual review and scoring by team members
- Model-graded evaluation: Using an LLM as a judge
- Code evaluation: Custom functions that evaluate quantifiable criteria
- Auto-categorization: Automated tagging based on specified categories
- Curating examples from production logs
- Uploading CSV or JSONL files
- Authoring directly in the Freeplay UI
The API parameter is
testlist for legacy reasons, but we use “dataset” in the UI and when referring to this concept in prose.- Component-level: Testing individual prompts or components
- End-to-end: Testing complete workflows like full agent runs
Workflows and collaboration
Workflows and collaboration
Review queueA workflow for human review of production outputs. Review queues enable teams to:
- Review and annotate completions or traces
- Add structured labels or free-text notes
- Correct LLM judge scores when they’re wrong
- Curate examples into datasets
SDK and API
SDK and API
Freeplay SDKClient libraries for integrating Freeplay into your application. Available for:
- Python:
freeplay(install viapip install freeplay) - TypeScript/Node:
freeplay(install vianpm install freeplay) - Java/JVM:
ai.freeplay:client(see SDK Setup for Maven/Gradle config)
Related resources
- Why Freeplay? - Overview of Freeplay’s approach
- Getting Started - Quick start guides
- SDK Documentation - Detailed SDK reference
- API Reference - HTTP API documentation

