3/26/24 Updates
Recent Updates: Better experimentation workflow, new models, and private hosting options for enterprise
- Load real data in the prompt playground
- New batch testing workflow
- Support for OpenAI fine-tuned models & Claude 3
- New private hosting option
See all the details here
Load Observed Data & Test Cases in Prompt Playground
We continue to make it easier to complete the product iteration loop: moving from discovering issues in production to finding, testing and deploying improvements.
You can now open any Session directly in the playground to test prompt/model changes with real, observed data. Additionally you can load previously-saved test cases from any Dataset directly in the playground to get a sense of how your prompt or model changes behave across a set of real examples.
No more copy/pasting test cases into a playground!
To test it out, click the new yellow âEditâ button on any Session page.
Hereâs a quick Loom to show how it works in practice.
New Batch Testing Workflow
Building on that last change, weâve also made it easier to launch batch tests directly in the Freeplay app. This has long been possible with Freeplay using our Test Runs SDK method, but itâs now a standard part of our in-app experience as well.
When experimenting with prompts or model configurations, relying on a small handful of test cases isnât usually enough. You generally want to test across a range of edge cases, golden set examples, etc. and quantify the performance of this new version vs. a prior version. The goal is to make an informed, data-driven decision about whether to ship or not.
Now, any time someone saves a new prompt version, theyâre prompted to launch a batch test â and can instantly see how it scores across a full range of evaluation criteria.
Hereâs another quick demo.
Support for OpenAI Fine-Tuned Models
Running fine-tuned versions of OpenAI models, either directly via the OpenAI API or via the Azure OpenAI service? Weâve made it easy to configure these in Freeplay and integrate them with our SDKs.
To get started, to go Settings > Models and either click âAdd fine-tuned modelâ for OpenAI, or enable Azure OpenAI if relevant and click âAdd new endpoint.â If youâre using Azure, be sure to select âfine-tunedâ from the model dropdown.
Once configured, youâll be able to select them when configuring a prompt template in the editor.
Support for Anthropic Claude 3 Models
Weâve also added native support for all the new Antropic Claude 3 modeling including Opus, Sonnet, and Haiku. These models are fully supported across the Freeplay app and all SDKs.
New Private Hosting Option
For some Freeplay customers, itâs been essential to maintain full control of their data. Weâve previously offered a fully self-hosted option to enterprise customers, but weâve also gotten feedback that the maintenance required to keep it up to date can be a burden.
Weâre pleased to offer a new private hosting option which allows our customers to host all their data entirely within their network, and create a private network connection to a single-tenant Freeplay instance provisioned exclusively for their use. This combines the best of both worldsâ giving you full confidence around data privacy and protection, while allowing Freeplay to keep the application up to date for you.
We support this option today for both the AWS and GCP clouds. Details and architecture diagram here.