12/12/24 Updates

5 months ago by Ian Cairns

🛠️
Recent updates include: So. Much. Good. Stuff.

Major Tools updates including schema management, swap between providers, testing in the Freeplay UI, and more (More on the blog)

New filtering experience radically speeds up Observability workflows

Redesigned Prompts page makes it easier to manage evals, tests, and see every time a specific prompt version is used in prod

Easier to interpret test results for end-to-end tests of prompt chains and agent flows

New cost management features including fine-grained eval sampling and Billing page with live Freeplay usage details

Model updates: Gemini 2.0 is here! Test it in Freeplay today

Bunch of other fixes and performance improvements

Read the details here.

10/23/24 Updates

7 months ago by Jeremy Silva

💡
Recent updates: New Eval Template Library & AI Assistant, New Usage Dashboards & Spend Management, Model Updates Including Claude 3.5 Sonnet & Llama 3.2, Plus Major UI Improvements, Bug Fixes, & Significant Performance Improvements

Eval suite upgraded with template library and an AI assistant (More on the blog)

Monitor team spend and usage across providers and environments (More on the blog)

Run auto evals on any individual session straight from the observability tab

Model updates: Claude 3.5 Sonnet on Anthropic, plus Llama 3.2 on Bedrock

Major UI improvements, plus bug fixes and a significant boost in performance

Read more here

9/3/24 Updates

8 months ago by Ian Cairns

💡
Recent Updates: Easier Model-Graded Evals, New Ground Truth Comparison Support, More Granular Filtering, Expanded API Capabilities, and Major Model Updates

Create and align model-graded evals within Freeplay

Target ground truth values for evals in the Freeplay UI

Filter for individual Completions instead of just complete Sessions

Use the Freeplay API to filter, export or delete Sessions, and manage Datasets

Try the new Gemini 1.5 Pro experimental release, and make sure to stop using old GPT-3.5 versions by next week

Read more here

7/25/24 Updates

10 months ago by Ian Cairns

✨
Recent Updates: New Review & Testing Features for Multi-Turn Chatbots, Improved Human Review Workflows, gpt-4o-mini and Llama 3.1!

Multi-turn chatbot support includes new chat view and history object support

Faster, easier data reviews & workflows with Markdown rendering and new Review panel

New models include gpt-4o-mini, Llama 3.1, and native Groq support!

Details here.

6/20/24 Updates

11 months ago by Eric Ryan

✨
Recent Updates: Better Mustache Templating, Datasets Improvements, New Models Including Claude 3.5 Sonnet & More

Enhanced Mustache conditional highlighting for complex prompt logic

Streamlined creation and upload functionality for datasets

Support for new models: Gemini 1.5, Llama 3, Mistral, and Claude Sonnet 3.5

Plus a bunch of smaller updates! Check out the list

Details here.

5/24/24 Updates

12 months ago by Ian Cairns

✨
Recent Updates: Code-Driven Evals, Model-Graded Eval Alignment, Saved Filters for Faster Reviews, Updated Comparisons Feature, and Full Support for Enterprise Models

Log eval results directly from your code, including pairwise comparisons against ground truth datasets

Align your model-graded evals to your team’s expectations with human feedback

Saved Filters make it faster to review production data on a regular basis

Updated comparisons workflow for streamlined comparisons between different prompts, models, or versions of your code

Use your enterprise models like Bedrock Anthropic, Azure OpenAI, or Llama 3 on SageMaker in the Freeplay playground and Tests features

4/16/24 Updates

about 1 year ago by Jeremy Silva

📣
Recent Updates: Comparative Tests UI, faster Session browsing, Prompt editor improvements & New Freeplay API

Instantly compare test results to benchmark data

Move faster through data reviews with new Session browser

Configure JSON Mode & other advanced parameters on prompt templates

Freeplay Public API offers full parity to SDKs

See all the details here

3/26/24 Updates

about 1 year ago by Ian Cairns

📣
Recent Updates: Better experimentation workflow, new models, and private hosting options for enterprise

Load real data in the prompt playground

New batch testing workflow

Support for OpenAI fine-tuned models & Claude 3

New private hosting option

See all the details here

2/9/24 Updates

over 1 year ago by Ian Cairns

📣
Recent Updates: New prompt editor, better graphs, Thin SDK & new CLI, and new OpenAI model versions

New prompt editor lets you test multiple versions/models at once

Better observability with faster, interactive graphs & personalized table views

Thin SDK and new CLI = full developer control

New OpenAI model versions – 0125 & 0125-preview

See all the details here

1/24/24 Updates

over 1 year ago by Jeremy Silva

📣
Recent Updates: Live monitoring, Prompt versioning improvements & Thin SDK Preview

Live Monitoring of LLM sessions using auto-evals

Prompt versioning UX enhancements, including commit messages & version titles

New Thin SDK preview — increased control for developers

See all the details here

12/12/24 Updates

🛠️
Recent updates include: So. Much. Good. Stuff.

10/23/24 Updates

💡
Recent updates: New Eval Template Library & AI Assistant, New Usage Dashboards & Spend Management, Model Updates Including Claude 3.5 Sonnet & Llama 3.2, Plus Major UI Improvements, Bug Fixes, & Significant Performance Improvements

9/3/24 Updates

💡
Recent Updates: Easier Model-Graded Evals, New Ground Truth Comparison Support, More Granular Filtering, Expanded API Capabilities, and Major Model Updates

7/25/24 Updates

✨

Recent Updates: New Review & Testing Features for Multi-Turn Chatbots, Improved Human Review Workflows, gpt-4o-mini and Llama 3.1!

6/20/24 Updates

✨

Recent Updates: Better Mustache Templating, Datasets Improvements, New Models Including Claude 3.5 Sonnet & More

5/24/24 Updates

✨
Recent Updates: Code-Driven Evals, Model-Graded Eval Alignment, Saved Filters for Faster Reviews, Updated Comparisons Feature, and Full Support for Enterprise Models

4/16/24 Updates

📣
Recent Updates: Comparative Tests UI, faster Session browsing, Prompt editor improvements & New Freeplay API

3/26/24 Updates

📣
Recent Updates: Better experimentation workflow, new models, and private hosting options for enterprise

2/9/24 Updates

📣
Recent Updates: New prompt editor, better graphs, Thin SDK & new CLI, and new OpenAI model versions

1/24/24 Updates

📣
Recent Updates: Live monitoring, Prompt versioning improvements & Thin SDK Preview

🛠️Recent updates include: So. Much. Good. Stuff.

💡Recent updates: New Eval Template Library & AI Assistant, New Usage Dashboards & Spend Management, Model Updates Including Claude 3.5 Sonnet & Llama 3.2, Plus Major UI Improvements, Bug Fixes, & Significant Performance Improvements

💡Recent Updates: Easier Model-Graded Evals, New Ground Truth Comparison Support, More Granular Filtering, Expanded API Capabilities, and Major Model Updates

✨

Recent Updates: New Review & Testing Features for Multi-Turn Chatbots, Improved Human Review Workflows, gpt-4o-mini and Llama 3.1!

✨

Recent Updates: Better Mustache Templating, Datasets Improvements, New Models Including Claude 3.5 Sonnet & More

✨Recent Updates: Code-Driven Evals, Model-Graded Eval Alignment, Saved Filters for Faster Reviews, Updated Comparisons Feature, and Full Support for Enterprise Models

📣Recent Updates: Comparative Tests UI, faster Session browsing, Prompt editor improvements & New Freeplay API

📣Recent Updates: Better experimentation workflow, new models, and private hosting options for enterprise

📣Recent Updates: New prompt editor, better graphs, Thin SDK & new CLI, and new OpenAI model versions

📣Recent Updates: Live monitoring, Prompt versioning improvements & Thin SDK Preview

🛠️
Recent updates include: So. Much. Good. Stuff.

💡
Recent updates: New Eval Template Library & AI Assistant, New Usage Dashboards & Spend Management, Model Updates Including Claude 3.5 Sonnet & Llama 3.2, Plus Major UI Improvements, Bug Fixes, & Significant Performance Improvements

💡
Recent Updates: Easier Model-Graded Evals, New Ground Truth Comparison Support, More Granular Filtering, Expanded API Capabilities, and Major Model Updates

✨
Recent Updates: Code-Driven Evals, Model-Graded Eval Alignment, Saved Filters for Faster Reviews, Updated Comparisons Feature, and Full Support for Enterprise Models

📣
Recent Updates: Comparative Tests UI, faster Session browsing, Prompt editor improvements & New Freeplay API

📣
Recent Updates: Better experimentation workflow, new models, and private hosting options for enterprise

📣
Recent Updates: New prompt editor, better graphs, Thin SDK & new CLI, and new OpenAI model versions

📣
Recent Updates: Live monitoring, Prompt versioning improvements & Thin SDK Preview