Freeplay Introduction

How teams observe, evaluatate and iterate towared great AI applications

Overview

Freeplay is a single platform to manage the end-to-end AI application development lifecycle for your entire team. It gives product development teams the power to review sessions, experiment with changes, evaluate and test those iterations, and deploy AI features.

Here's a quick video overview:

Core Concepts

Master the foundational features that power your LLM workflow. These guides will help you build, test, and improve your AI applications systematically.

Core benefits

Production Observability See how your AI applications systems are behaving across environments in real-time, including prompt and response details, customer feedback, and evaluation scores for your production logs.

Prompt & Model Versioning and Deployments Manage and version prompt templates across environments, including your prompt text and model configurations. Deploy changes straight to your code like a feature flag -- no deploy required.

Custom Evaluations Create a custom suite of evals specific to your product experience. Use them both for production logs and offline experiments, so you can spot issues and quantify improvements as you update your prompts, models, RAG pipelines and code.

Easy Batch Tests Any time you make a change to prompts, models, or any other part of your pipeline, you can quickly test at scale with your own custom datasets and real examples from production logs. Anyone on your team can generate new tests from the Freeplay UI or from your code, including from CI. Iterate with confidence.

Multi-Player Review Workflows Set up custom filters and queues for your whole team, and collaborate to review production logs and test results.

Label and Curate Datasets Launch human labeling jobs and curate custom datasets from your application logs, which you can then use for testing and fine-tuning.