Skip to main content
This glossary provides definitions for the key terms and concepts you’ll encounter when using Freeplay. Terms are organized by category to help you understand how they relate to each other.

Project

A top-level workspace in Freeplay that contains all your prompt templates, sessions, datasets, evaluations, and configurations. Projects are identified by a unique project_id and represent a distinct AI application or use case. All other entities in Freeplay (sessions, prompt templates, datasets, etc.) belong to a project. See Project Setup for configuration details.

Observability

Observability in Freeplay refers to capturing and analyzing the behavior of your AI application through logged data.

Observability hierarchy

Freeplay uses a three-level hierarchy to organize your AI application logs. From highest to lowest level:

Session

The container for a complete user interaction, conversation, or agent run. Sessions group related traces and completions together. Examples include an entire chatbot conversation, a complete agent workflow, or a single user request that triggers multiple LLM calls. See Sessions, Traces, and Completions for more details.

Trace

An optional grouping of related completions and tool calls within a session. Traces represent a functional unit of work, such as a single turn in a conversation, one run of an agent, or a logical step in a multi-step workflow. Traces can be nested to represent sub-agents or complex workflows. Traces can optionally be given a name (like “planning” or “tool_selection”) to represent a specific agent or workflow type. Named traces unlock additional Freeplay features: you can configure evaluation criteria to run against them, create linked datasets for testing, and group similar traces for analysis. See Agent below. See Traces and Record Traces for implementation details.

Completion

The atomic unit of observability in Freeplay. A completion represents a single LLM call, including the input prompt (messages) and the model’s response. Every completion is associated with a session and optionally a trace. See Recording Completions for implementation details.

Other observability concepts

Agent

In Freeplay, an “agent” refers to a named category of traces that represent semantically similar workflows or behaviors. When you give traces the same name (e.g., “research_agent” or “customer_support”), Freeplay groups them together as an agent. This grouping enables you to:
  • Configure evaluation criteria that run automatically against traces with that name
  • Create datasets linked to that agent for testing
  • Analyze performance and quality across all traces of that type
It’s up to developers to define what constitutes an “agent” in their application. An agent might represent an entire autonomous workflow, a specific sub-task, or any logical grouping that makes sense for your use case. See Agents for guidance on structuring agent workflows.

Tool call

A record of a tool or function call made during an agent workflow, including both the request (tool name and arguments) and the result. Tool calls are recorded at the same level as completions within a trace. See Tools for guidance on recording tool calls.

Custom metadata

Contextual information attached to sessions, traces, or completions. Use custom metadata to store data like user IDs, feature flags, business metrics, or workflow identifiers that help with filtering, searching, and analysis. Custom metadata is recorded via custom_metadata fields when creating or updating observability objects. Don’t use custom metadata for user feedback like ratings or comments—use customer feedback instead.

Customer feedback

End-user feedback recorded through dedicated feedback endpoints. Customer feedback includes ratings (thumbs up/down, star ratings) and freeform comments. This data receives special treatment in the Freeplay UI due to its distinct utility for quality improvement. Customer feedback is recorded via the /completion-feedback/ or /trace-feedback/ API endpoints. See Customer Feedback for implementation details.

Prompt management

Prompt template

A versioned configuration that defines everything needed to make an LLM call: the message structure (using Mustache syntax for variables), provider and model selection, request parameters (like temperature), and optionally tool schemas or output structure definitions. Prompt templates separate the static structure of your prompts from the dynamic variables populated at runtime. This structure enables easy versioning, A/B testing, and dataset creation from production logs. See Managing Prompts for more details.

Environment

A deployment target for prompt templates, such as dev, staging, prod, or latest. Environments let you deploy different versions of your prompts to different stages of your application lifecycle, similar to feature flags. The latest environment always points to the most recently created version of a prompt template. Custom environments can be created for specific use cases. See Deployment Environments for configuration details.

Prompt bundling

The practice of snapshotting prompt template configurations into your source code repository rather than fetching them from Freeplay’s server at runtime. Prompt bundling removes Freeplay from the “hot path” of your application, improving latency and providing compliance benefits for regulated industries. See Prompt Bundling for implementation guidance.

Mustache

The templating syntax used in Freeplay prompt templates for variable interpolation. Mustache uses double curly braces ({{variable_name}}) for simple substitution and supports conditional logic and iteration. See Advanced Prompt Templating Using Mustache for syntax reference.

Evaluation and testing

Evaluation

The process of measuring and scoring the quality of AI outputs. Freeplay supports four types of evaluations:
  • Human evaluation: Manual review and scoring by team members
  • Model-graded evaluation: Using an LLM as a judge
  • Code evaluation: Custom functions that evaluate quantifiable criteria
  • Auto-categorization: Automated tagging based on specified categories
See Evaluations for an overview.

LLM judge

An LLM-based evaluator that scores AI outputs against specified criteria. Also called “model-graded evaluation.” LLM judges can assess nuanced qualities like helpfulness, accuracy, or tone that are difficult to evaluate with code alone. See Model-Graded Evaluations and Creating and Aligning Model-Graded Evals for implementation details.

Dataset

A collection of test cases used for evaluation and testing. Datasets can be created by:
  • Curating examples from production logs
  • Uploading CSV or JSONL files
  • Authoring directly in the Freeplay UI
Datasets have schemas that enforce compatibility with specific prompt templates or agents.
The API parameter is testlist for legacy reasons, but we use “dataset” in the UI and when referring to this concept in prose.
See Datasets for more details.

Test run

A batch execution of evaluations against a dataset. Test runs can be:
  • Component-level: Testing individual prompts or components
  • End-to-end: Testing complete workflows like full agent runs
Test runs can be initiated via the Freeplay UI or programmatically via SDK for CI/CD integration. See Test Runs and Running a Test Run for implementation details.

Workflows and collaboration

Review queue

A workflow for human review of production outputs. Review queues enable teams to:
  • Review and annotate completions or traces
  • Add structured labels or free-text notes
  • Correct LLM judge scores when they’re wrong
  • Curate examples into datasets
See Review Queues for more details.

Data flywheel

The continuous improvement cycle enabled by Freeplay’s connected workflow. Production logs flow into datasets, which feed evaluations, which inform prompt improvements, which generate better logs. Each iteration strengthens prompts, datasets, evaluation criteria, and testing infrastructure together.

SDK and API

Freeplay SDK

Client libraries for integrating Freeplay into your application. Available for:
  • Python: freeplay (install via pip install freeplay)
  • TypeScript/Node: freeplay (install via npm install freeplay)
  • Java/JVM: ai.freeplay:client (see SDK Setup for Maven/Gradle config)
See SDK Setup for installation and configuration.

Provider

The LLM service that processes your prompts. Freeplay supports any provider, including OpenAI, Anthropic, AWS Bedrock, Azure OpenAI, Google, and self-hosted models. The provider is specified in prompt templates or when recording completions.

Flavor

The message format used by a specific provider. Different providers expect messages in different formats (e.g., OpenAI’s chat format vs. Anthropic’s format). Freeplay handles format conversion based on the configured flavor.