12/12/24 Updates

🛠️

Recent updates include: So. Much. Good. Stuff.

  • Major Tools updates including schema management, swap between providers, testing in the Freeplay UI, and more (More on the blog)
  • New filtering experience radically speeds up Observability workflows
  • Redesigned Prompts page makes it easier to manage evals, tests, and see every time a specific prompt version is used in prod
  • Easier to interpret test results for end-to-end tests of prompt chains and agent flows
  • New cost management features including fine-grained eval sampling and Billing page with live Freeplay usage details
  • Model updates: Gemini 2.0 is here! Test it in Freeplay today
  • Bunch of other fixes and performance improvements

Read the details here.

10/23/24 Updates

💡

Recent updates: New Eval Template Library & AI Assistant, New Usage Dashboards & Spend Management, Model Updates Including Claude 3.5 Sonnet & Llama 3.2, Plus Major UI Improvements, Bug Fixes, & Significant Performance Improvements

  • Eval suite upgraded with template library and an AI assistant (More on the blog)
  • Monitor team spend and usage across providers and environments (More on the blog)
  • Run auto evals on any individual session straight from the observability tab
  • Model updates: Claude 3.5 Sonnet on Anthropic, plus Llama 3.2 on Bedrock
  • Major UI improvements, plus bug fixes and a significant boost in performance

Read more here

9/3/24 Updates

💡

Recent Updates: Easier Model-Graded Evals, New Ground Truth Comparison Support, More Granular Filtering, Expanded API Capabilities, and Major Model Updates

  • Create and align model-graded evals within Freeplay
  • Target ground truth values for evals in the Freeplay UI
  • Filter for individual Completions instead of just complete Sessions
  • Use the Freeplay API to filter, export or delete Sessions, and manage Datasets
  • Try the new Gemini 1.5 Pro experimental release, and make sure to stop using old GPT-3.5 versions by next week

Read more here

7/25/24 Updates

Recent Updates: New Review & Testing Features for Multi-Turn Chatbots, Improved Human Review Workflows, gpt-4o-mini and Llama 3.1!

  • Multi-turn chatbot support includes new chat view and history object support
  • Faster, easier data reviews & workflows with Markdown rendering and new Review panel
  • New models include gpt-4o-mini, Llama 3.1, and native Groq support!

Details here.

6/20/24 Updates

Recent Updates: Better Mustache Templating, Datasets Improvements, New Models Including Claude 3.5 Sonnet & More

  • Enhanced Mustache conditional highlighting for complex prompt logic
  • Streamlined creation and upload functionality for datasets
  • Support for new models: Gemini 1.5, Llama 3, Mistral, and Claude Sonnet 3.5
  • Plus a bunch of smaller updates! Check out the list

Details here.

5/24/24 Updates

Recent Updates: Code-Driven Evals, Model-Graded Eval Alignment, Saved Filters for Faster Reviews, Updated Comparisons Feature, and Full Support for Enterprise Models

  • Log eval results directly from your code, including pairwise comparisons against ground truth datasets
  • Align your model-graded evals to your team’s expectations with human feedback
  • Saved Filters make it faster to review production data on a regular basis
  • Updated comparisons workflow for streamlined comparisons between different prompts, models, or versions of your code
  • Use your enterprise models like Bedrock Anthropic, Azure OpenAI, or Llama 3 on SageMaker in the Freeplay playground and Tests features

4/16/24 Updates

📣

Recent Updates: Comparative Tests UI, faster Session browsing, Prompt editor improvements & New Freeplay API

  • Instantly compare test results to benchmark data
  • Move faster through data reviews with new Session browser
  • Configure JSON Mode & other advanced parameters on prompt templates
  • Freeplay Public API offers full parity to SDKs

See all the details here

3/26/24 Updates

📣

Recent Updates: Better experimentation workflow, new models, and private hosting options for enterprise

  • Load real data in the prompt playground
  • New batch testing workflow
  • Support for OpenAI fine-tuned models & Claude 3
  • New private hosting option

See all the details here

2/9/24 Updates

📣

Recent Updates: New prompt editor, better graphs, Thin SDK & new CLI, and new OpenAI model versions

  • New prompt editor lets you test multiple versions/models at once
  • Better observability with faster, interactive graphs & personalized table views
  • Thin SDK and new CLI = full developer control
  • New OpenAI model versions – 0125 & 0125-preview

See all the details here

1/24/24 Updates

📣

Recent Updates: Live monitoring, Prompt versioning improvements & Thin SDK Preview

  • Live Monitoring of LLM sessions using auto-evals
  • Prompt versioning UX enhancements, including commit messages & version titles
  • New Thin SDK preview — increased control for developers

See all the details here