Code Evaluations
Code Evaluations in Freeplay
In addition to Human and Model Graded evaluations, Freeplay also offers the ability to run code-driven evaluations directly on the client-side, then log those results to Freeplay. These evals are generally functions written and run in the client's code path and then recorded back to Freeplay.
These evaluations are particularly useful for criteria requiring logical expressions, such as JSON schema checks or category assertions on single answers, or for pairwise comparisons to an expected output via methods like embedding or string distance. Code evals can be added both to:
- Individual Sessions
- Test Runs executed with our SDK or API, which can include comparisons to ground truth data
In either case, any results you log to Freeplay flow through to the UI just like human or model-graded evals. See our SDK documentation for more details.

Resources
Updated about 18 hours ago
Now review each evaluation type and then move onto test runs once all your evaluations are configured!