3/26/24 Updates

📣
Recent Updates: Better experimentation workflow, new models, and private hosting options for enterprise

Load real data in the prompt playground

New batch testing workflow

Support for OpenAI fine-tuned models & Claude 3

New private hosting option

See all the details here

Load Observed Data & Test Cases in Prompt Playground

We continue to make it easier to complete the product iteration loop: moving from discovering issues in production to finding, testing and deploying improvements.

You can now open any Session directly in the playground to test prompt/model changes with real, observed data. Additionally you can load previously-saved test cases from any Dataset directly in the playground to get a sense of how your prompt or model changes behave across a set of real examples.

No more copy/pasting test cases into a playground!

To test it out, click the new yellow “Edit” button on any Session page.

Here’s a quick Loom to show how it works in practice.

New Batch Testing Workflow

Building on that last change, we’ve also made it easier to launch batch tests directly in the Freeplay app. This has long been possible with Freeplay using our Test Runs SDK method, but it’s now a standard part of our in-app experience as well.

When experimenting with prompts or model configurations, relying on a small handful of test cases isn’t usually enough. You generally want to test across a range of edge cases, golden set examples, etc. and quantify the performance of this new version vs. a prior version. The goal is to make an informed, data-driven decision about whether to ship or not.

Now, any time someone saves a new prompt version, they’re prompted to launch a batch test — and can instantly see how it scores across a full range of evaluation criteria.

Here’s another quick demo.

Support for OpenAI Fine-Tuned Models

Running fine-tuned versions of OpenAI models, either directly via the OpenAI API or via the Azure OpenAI service? We’ve made it easy to configure these in Freeplay and integrate them with our SDKs.

To get started, to go Settings > Models and either click “Add fine-tuned model” for OpenAI, or enable Azure OpenAI if relevant and click “Add new endpoint.” If you’re using Azure, be sure to select “fine-tuned” from the model dropdown.

Once configured, you’ll be able to select them when configuring a prompt template in the editor.

More here

Support for Anthropic Claude 3 Models

We’ve also added native support for all the new Antropic Claude 3 modeling including Opus, Sonnet, and Haiku. These models are fully supported across the Freeplay app and all SDKs.

New Private Hosting Option

For some Freeplay customers, it’s been essential to maintain full control of their data. We’ve previously offered a fully self-hosted option to enterprise customers, but we’ve also gotten feedback that the maintenance required to keep it up to date can be a burden.

We’re pleased to offer a new private hosting option which allows our customers to host all their data entirely within their network, and create a private network connection to a single-tenant Freeplay instance provisioned exclusively for their use. This combines the best of both worlds– giving you full confidence around data privacy and protection, while allowing Freeplay to keep the application up to date for you.

We support this option today for both the AWS and GCP clouds. Details and architecture diagram here.

📣Recent Updates: Better experimentation workflow, new models, and private hosting options for enterprise

Load Observed Data & Test Cases in Prompt Playground

New Batch Testing Workflow

Support for OpenAI Fine-Tuned Models

Support for Anthropic Claude 3 Models

New Private Hosting Option

📣
Recent Updates: Better experimentation workflow, new models, and private hosting options for enterprise