12/12/24 Updates
Recent updates include: So. Much. Good. Stuff.
- Major Tools updates including schema management, swap between providers, testing in the Freeplay UI, and more (More on the blog)
- New filtering experience radically speeds up Observability workflows
- Redesigned Prompts page makes it easier to manage evals, tests, and see every time a specific prompt version is used in prod
- Easier to interpret test results for end-to-end tests of prompt chains and agent flows
- New cost management features including fine-grained eval sampling and Billing page with live Freeplay usage details
- Model updates: Gemini 2.0 is here! Test it in Freeplay today
- Bunch of other fixes and performance improvements
Read the details here.
10/23/24 Updates
Recent updates: New Eval Template Library & AI Assistant, New Usage Dashboards & Spend Management, Model Updates Including Claude 3.5 Sonnet & Llama 3.2, Plus Major UI Improvements, Bug Fixes, & Significant Performance Improvements
- Eval suite upgraded with template library and an AI assistant (More on the blog)
- Monitor team spend and usage across providers and environments (More on the blog)
- Run auto evals on any individual session straight from the observability tab
- Model updates: Claude 3.5 Sonnet on Anthropic, plus Llama 3.2 on Bedrock
- Major UI improvements, plus bug fixes and a significant boost in performance
9/3/24 Updates
Recent Updates: Easier Model-Graded Evals, New Ground Truth Comparison Support, More Granular Filtering, Expanded API Capabilities, and Major Model Updates
- Create and align model-graded evals within Freeplay
- Target ground truth values for evals in the Freeplay UI
- Filter for individual Completions instead of just complete Sessions
- Use the Freeplay API to filter, export or delete Sessions, and manage Datasets
- Try the new Gemini 1.5 Pro experimental release, and make sure to stop using old GPT-3.5 versions by next week
7/25/24 Updates
Recent Updates: New Review & Testing Features for Multi-Turn Chatbots, Improved Human Review Workflows, gpt-4o-mini and Llama 3.1!
- Multi-turn chatbot support includes new chat view and
history
object support- Faster, easier data reviews & workflows with Markdown rendering and new Review panel
- New models include
gpt-4o-mini
, Llama 3.1, and native Groq support!
6/20/24 Updates
Recent Updates: Better Mustache Templating, Datasets Improvements, New Models Including Claude 3.5 Sonnet & More
- Enhanced Mustache conditional highlighting for complex prompt logic
- Streamlined creation and upload functionality for datasets
- Support for new models: Gemini 1.5, Llama 3, Mistral, and Claude Sonnet 3.5
- Plus a bunch of smaller updates! Check out the list
5/24/24 Updates
Recent Updates: Code-Driven Evals, Model-Graded Eval Alignment, Saved Filters for Faster Reviews, Updated Comparisons Feature, and Full Support for Enterprise Models
- Log eval results directly from your code, including pairwise comparisons against ground truth datasets
- Align your model-graded evals to your team’s expectations with human feedback
- Saved Filters make it faster to review production data on a regular basis
- Updated comparisons workflow for streamlined comparisons between different prompts, models, or versions of your code
- Use your enterprise models like Bedrock Anthropic, Azure OpenAI, or Llama 3 on SageMaker in the Freeplay playground and Tests features
4/16/24 Updates
Recent Updates: Comparative Tests UI, faster Session browsing, Prompt editor improvements & New Freeplay API
- Instantly compare test results to benchmark data
- Move faster through data reviews with new Session browser
- Configure JSON Mode & other advanced parameters on prompt templates
- Freeplay Public API offers full parity to SDKs
See all the details here
3/26/24 Updates
Recent Updates: Better experimentation workflow, new models, and private hosting options for enterprise
- Load real data in the prompt playground
- New batch testing workflow
- Support for OpenAI fine-tuned models & Claude 3
- New private hosting option
See all the details here
2/9/24 Updates
Recent Updates: New prompt editor, better graphs, Thin SDK & new CLI, and new OpenAI model versions
- New prompt editor lets you test multiple versions/models at once
- Better observability with faster, interactive graphs & personalized table views
- Thin SDK and new CLI = full developer control
- New OpenAI model versions –
0125
&0125-preview
See all the details here
1/24/24 Updates
Recent Updates: Live monitoring, Prompt versioning improvements & Thin SDK Preview
- Live Monitoring of LLM sessions using auto-evals
- Prompt versioning UX enhancements, including commit messages & version titles
- New Thin SDK preview — increased control for developers
See all the details here