6/20/24 Updates

✨
Recent Updates: Better Mustache Templating, Datasets Improvements, New Models Including Claude 3.5 Sonnet & More

Enhanced Mustache conditional highlighting for complex prompt logic

Streamlined creation and upload functionality for datasets

Support for new models: Gemini 1.5, Llama 3, Mistral, and Claude Sonnet 3.5

Plus a bunch of smaller updates! Check out the list

Details here.

Easier Control of Complex Prompt Logic Using Mustache

We've seen more of our customers taking advantage of our Mustache syntax support in Freeplay to manage complex prompt logic like conditional sub-prompts and variables. A key benefit of using Mustache templates instead of handling conditional logic entirely in code is that non-engineers can edit and update any sub-prompts without using a code editor.

We’ve made it even easier to compose and edit these prompts in Freeplay so that it’s faster to get up to speed and experiment with changes:

Color-Coded Conditionals: New color coding for Mustache conditional clauses in a prompt makes it easier to see the logic, both in the prompt editor/playground and when viewing recorded Sessions. "True" conditions are highlighted in green, and "false" conditions are highlighted in red. This makes it easier to see at a glance which conditions are met and which are not — both when composing prompts, and when viewing logs.
Interactive Highlighting: When you hover over a conditional value that’s recorded to Freeplay, it shows any nested values. This helps you understand the flow of the logic that was executed in your prompts without needing to delve deeply into the code each time.

Check out the Loom below to see how it works.

Managing Datasets: Streamlined Creation and Upload Functionality

We've made significant updates to the Datasets feature within Freeplay that will make it easier to set up and curate datasets over time for both testing & fine-tuning purposes – including the ability to save an observed example straight from the playground, and quickly edit it to define a better ground truth output.

A few highlights:

Saving Examples from Playground: When experimenting in the playground, you can now save test cases or examples directly into a dataset. This makes it easy to think of a test case in the playground context, and save it for future ongoing use.
Manual Example Creation: Previously you could either upload datasets, or save observed examples from Sessions. You can now manually create examples in the UI as well. Simply navigate to the dataset, click “New Example” and add any relevant input variables. You can optionally set a ground truth output for each example too.
Streamlined Dataset Creation: When creating a new dataset, you can now select a compatible prompt template to specify the variables that should exist in the dataset. Datasets will be compatible with any other prompt templates that share the same variable name(s). The JSONL upload process has new feedback and error messages to make it easier too.

Check out this quick Loom video for a detailed walkthrough of these updates to the Datasets feature.

New Default Model Support: Claude Sonnet 3.5, Gemini 1.5, Llama 3 & Mistral

We are thrilled to announce the addition of default support for several new models in Freeplay. All of these will be available to any customer out of the box, without having to set up your own API keys or infrastructure. They’ll make it easier to test and compare new model versions as they’re released.

New Default Models Added

Anthropic Model Access (Direct):

claude-3-5-sonnet-20240620

Gemini Vertex Models:

gemini-1.5-pro-001
gemini-1.5-flash-001

Meta Llama 3 Models via Bedrock:

meta.llama3-70b-instruct-v1:0
meta.llama3-8b-instruct-v1:0

Mistral Models via Bedrock:

mistral.mistral-large-2402-v1:0
mistral.mistral-small-2402-v1:0

Anthropic Claude Models via Bedrock:

anthropic.claude-3-5-sonnet-20240620-v1:0
anthropic.claude-3-haiku-20240307-v1:0
anthropic.claude-3-opus-20240229-v1:0
anthropic.claude-3-sonnet-20240229-v1:0
anthropic.claude-instant-v1

Additional Updates

Alongside these changes we've made 30+ other small updates or bug fixes, including:

New special type for logging thumbs up/down customer feedback: Use freeplay-feedback as they key and either freeplay-positive-feedback or freeplay-negative-feedback as values to render 👍 or 👎. Docs here.
Delete Sessions via API or SDK: This can be helpful especially for compliance purposes, see docs here.
Full parity for v2 API: If you're using the Freeplay v1 API, this matters for you! The v2 API is fully at parity and ready for us. We'll be announcing a deprecation plan for the v1 API later this summer.
New SSO provider support: We now offer the option for SAML or OIDC connections via popular IDPs including Okta, Duo, and more.