Glossary

This glossary provides definitions for the key terms and concepts you’ll encounter when using Freeplay. Terms are organized by category to help you understand how they relate to each other.

Project

A top-level workspace in Freeplay that contains all your prompt templates, sessions, datasets, evaluations, and configurations. Projects are identified by a unique project_id and represent a distinct AI application or use case. All other entities in Freeplay (sessions, prompt templates, datasets, etc.) belong to a project.See Project Setup for configuration details.

Observability

Observability in Freeplay refers to capturing and analyzing the behavior of your AI application through logged data.

Observability hierarchy

Freeplay uses a three-level hierarchy to organize your AI application logs. From highest to lowest level:SessionThe container for a complete user interaction, conversation, or agent run. Sessions group related traces and completions together. Examples include an entire chatbot conversation, a complete agent workflow, or a single user request that triggers multiple LLM calls.See Sessions, Traces, and Completions for more details.TraceAn optional grouping of related completions and tool calls within a session. Traces represent a functional unit of work, such as a single turn in a conversation, one run of an agent, or a logical step in a multi-step workflow. Traces can be nested to represent sub-agents or complex workflows.Traces can optionally be given a name (like “planning” or “tool_selection”) to represent a specific agent or workflow type. Named traces unlock additional Freeplay features: you can configure evaluation criteria to run against them, create linked datasets for testing, and group similar traces for analysis. See Agent below.See Traces and Record Traces for implementation details.CompletionThe atomic unit of observability in Freeplay. A completion represents a single LLM call, including the input prompt (messages) and the model’s response. Every completion is associated with a session and optionally a trace.See Recording Completions for implementation details.

Other observability concepts

AgentIn Freeplay, an “agent” refers to a named category of traces that represent semantically similar workflows or behaviors. When you give traces the same name (e.g., “research_agent” or “customer_support”), Freeplay groups them together as an agent. This grouping enables you to:

Configure evaluation criteria that run automatically against traces with that name
Create datasets linked to that agent for testing
Analyze performance and quality across all traces of that type

It’s up to developers to define what constitutes an “agent” in their application. An agent might represent an entire autonomous workflow, a specific sub-task, or any logical grouping that makes sense for your use case.See Agents for guidance on structuring agent workflows.Tool callA record of a tool or function call made during an agent workflow, including both the request (tool name and arguments) and the result. Tool calls can be logged in two ways:

As part of completions (default): Tool calls appear in the message history—the tool call in the LLM’s output message, and the tool result as an input to the next LLM call. This is simpler and follows standard tool-calling patterns.
As explicit tool spans: Create separate traces with kind='tool' for granular visibility into tool execution timing and results as independent spans in the trace view.

See Tools for guidance on recording tool calls.Custom metadataContextual information attached to sessions, traces, or completions. Use custom metadata to store data like user IDs, feature flags, business metrics, or workflow identifiers that help with filtering, searching, and analysis.Custom metadata is recorded via custom_metadata fields when creating or updating observability objects. Don’t use custom metadata for user feedback like ratings or comments—use customer feedback instead.Customer feedbackEnd-user feedback recorded through dedicated feedback endpoints. Customer feedback includes ratings (thumbs up/down, star ratings) and freeform comments. This data receives special treatment in the Freeplay UI due to its distinct utility for quality improvement.Customer feedback is recorded via the /completion-feedback/ or /trace-feedback/ API endpoints.See Customer Feedback for implementation details.

Prompt management

Prompt templateA versioned configuration that defines everything needed to make an LLM call: the message structure (using Mustache syntax for variables), provider and model selection, request parameters (like temperature), and optionally tool schemas or output structure definitions.Prompt templates separate the static structure of your prompts from the dynamic variables populated at runtime. This structure enables easy versioning, A/B testing, and dataset creation from production logs.See Managing Prompts for more details.EnvironmentA deployment target for prompt templates, such as dev, staging, prod, or latest. Environments let you deploy different versions of your prompts to different stages of your application lifecycle, similar to feature flags.The latest environment always points to the most recently created version of a prompt template. Custom environments can be created for specific use cases.See Deployment Environments for configuration details.Prompt bundlingThe practice of snapshotting prompt template configurations into your source code repository rather than fetching them from Freeplay’s server at runtime. Prompt bundling removes Freeplay from the “hot path” of your application, improving latency and providing compliance benefits for regulated industries.See Prompt Bundling for implementation guidance.MustacheThe templating syntax used in Freeplay prompt templates for variable interpolation. Mustache uses double curly braces ({{variable_name}}) for simple substitution and supports conditional logic and iteration.See Advanced Prompt Templating Using Mustache for syntax reference.

Evaluation and testing

EvaluationThe process of measuring and scoring the quality of AI outputs. Freeplay supports four types of evaluations:

Human evaluation: Manual review and scoring by team members
Model-graded evaluation: Using an LLM as a judge
Code evaluation: Custom functions that evaluate quantifiable criteria
Auto-categorization: Automated tagging based on specified categories

See Evaluations for an overview.LLM judgeAn LLM-based evaluator that scores AI outputs against specified criteria. Also called “model-graded evaluation.” LLM judges can assess nuanced qualities like helpfulness, accuracy, or tone that are difficult to evaluate with code alone.See Model-Graded Evaluations and Creating and Aligning Model-Graded Evals for implementation details.DatasetA collection of test cases used for evaluation and testing. Datasets can be created by:

Curating examples from production logs
Uploading CSV or JSONL files
Authoring directly in the Freeplay UI

Datasets have schemas that enforce compatibility with specific prompt templates or agents.

The API parameter is testlist for legacy reasons, but we use “dataset” in the UI and when referring to this concept in prose.

See Datasets for more details.Test runA batch execution of evaluations against a dataset. Test runs can be:

Component-level: Testing individual prompts or components
End-to-end: Testing complete workflows like full agent runs

Test runs can be initiated via the Freeplay UI or programmatically via SDK for CI/CD integration.See Test Runs and Running a Test Run for implementation details.

Workflows and collaboration

Review queueA workflow for human review of production outputs. Review queues enable teams to:

Review and annotate completions or traces
Add structured labels or free-text notes
Correct LLM judge scores when they’re wrong
Curate examples into datasets

See Review Queues for more details.Data flywheelThe continuous improvement cycle enabled by Freeplay’s connected workflow. Production logs flow into datasets, which feed evaluations, which inform prompt improvements, which generate better logs. Each iteration strengthens prompts, datasets, evaluation criteria, and testing infrastructure together.

SDK and API

Freeplay SDKClient libraries for integrating Freeplay into your application. Available for:

Python: freeplay (install via pip install freeplay)
TypeScript/Node: freeplay (install via npm install freeplay)
Java/JVM: ai.freeplay:client (see SDK Setup for Maven/Gradle config)

See SDK Setup for installation and configuration.ProviderThe LLM service that processes your prompts. Freeplay supports any provider, including OpenAI, Anthropic, AWS Bedrock, Azure OpenAI, Google, and self-hosted models. The provider is specified in prompt templates or when recording completions.FlavorThe message format used by a specific provider. Different providers expect messages in different formats (e.g., OpenAI’s chat format vs. Anthropic’s format). Freeplay handles format conversion based on the configured flavor.

Why Freeplay? - Overview of Freeplay’s approach
Getting Started - Quick start guides
SDK Documentation - Detailed SDK reference
API Reference - HTTP API documentation

Getting Started

Account Setup

Core Concepts

How-To Guides

Developer Resources

Security & Compliance

Resources

Observability hierarchy

Other observability concepts

Getting Started

Account Setup

Core Concepts

How-To Guides

Developer Resources

Security & Compliance

Resources

​Observability hierarchy

​Other observability concepts

​Related resources

Observability hierarchy

Other observability concepts

Related resources