12/12/24 Updates

🛠️

Recent updates include: So. Much. Good. Stuff.

  • Major Tools updates including schema management, swap between providers, testing in the Freeplay UI, and more (More on the blog)
  • New filtering experience radically speeds up Observability workflows
  • Redesigned Prompts page makes it easier to manage evals, tests, and see every time a specific prompt version is used in prod
  • Easier to interpret test results for end-to-end tests of prompt chains and agent flows
  • New cost management features including fine-grained eval sampling and Billing page with live Freeplay usage details
  • Model updates: Gemini 2.0 is here! Test it in Freeplay today
  • Bunch of other fixes and performance improvements

Read the details here.

Build better AI Agents with extended tools support

We’ve made some big updates that make it much easier to manage, record, and experiment with tools in your prompts and agentic systems using Freeplay. With our latest changes it’s now easy to:

  • Iterate on tool schemas, either in your code or in the Freeplay app — whichever workflow you prefer
  • Test tool behavior in the Freeplay playground and code, so that even non-engineers can make prompt changes or update tool descriptions easily
  • Swap your prompts between model providers without modifying tool schemas (e.g. when swapping prompts between OpenAI, Anthropic or Google)
  • Run offline experiments and tests that include tools, including the ability to include structured tool calls in datasets managed with Freeplay
  • Run auto-evals that target tool schemas as part of your evaluation checks to confirm tool selection and other behavior matches you expectations
  • Below is a quick demo video of what the new changes look like. Read the blog for more detail or check out our updated docs here.

New filtering experience

A key use case for Freeplay is inspecting production logs and learning what’s happening in your AI products. Our new filtering UI rolling out over the coming days makes it much faster and more straightforward to build complex filters and browser through all of your Freeplay session and completion recordings.

Redesigned Prompts page

A key part of Freeplay is understanding the details of a specific prompt template version and all the configuration that goes with it, as well as how it behaves. Now it's quick and easy to see all the completions, tests, evals, and more for a specific version.

Updated view for multi-step tests (chains, agent flows, etc.)

When you're testing multi-step systems like prompt chains, agent flows, etc., you want to be able to evaluate the end-to-end flow as well as individual steps in the process. New updates to the Tests UI make it easier to do both -- seeing the initial input and final output, as well as the behavior at each step in the process.

New cost management features

We've added two new tools to help teams manage and track costs, including fine-grained sampling controls on any auto-evals (so you control how often it runs), and our new Billing page that shows all usage toward your Freeplay plan.

These build on recent releases for setting global and project-level spend controls, and our recently-shipped Usage dashboard described in this post.