Changelog

Improved

10/17/2025

🚀

We're helping you close the data flywheel faster. You can now curate golden dataset samples directly from production data, letting you build test cases with verified results right from real-world usage.

We've also shipped new dataset CRUD endpoints to support CI/CD workflows—making it simple to create and update datasets, manage test runs, and specify which evals to run against a test set via the API for targeted testing.

Read the details here.

8/22/2025

🚀

Major Release: Breaking Changes, Start from Record, Enhanced Agent Support

This release includes significant SDK improvements with breaking changes, powerful new "Start from Record" capabilities, and expanded agent functionality. Please review the SDK-specific sections below as this release requires code updates.

  • Major SDK updates with breaking changes across Python, JVM, and Node
  • New model support including GPT-5 and Claude Opus 4.1