Skip to main content
Freeplay provides a powerful set of tools to help you monitor, analyze, and take action on your log data. These capabilities build on each other:
  1. Custom Filters let you define complex queries to find specific data in your logs
  2. Saved Searches let you save important filters as reusable monitors that you can return to regularly
  3. Automations let you trigger actions automatically when logs match your saved search criteria

Custom Filters

When working with large volumes of production data, custom filters let you define complex boolean queries to find exactly the data you care about. Filtering is the foundation for all monitoring and automation capabilities in Freeplay. You can create filters directly in the Observability dashboard by selecting criteria such as:
  • Input and output values
  • Evaluation results and scores
  • Custom metadata (user IDs, session types, versions of your code, etc.)
  • Prompt templates/versions
  • Common metrics (e.g. cost, latency, token usage)
Combine multiple criteria with AND/OR logic to build precise queries that surface the completions you need to investigate, review, or act upon. After you apply a filter, it’s easy to batch select results and add to a dataset or review queue.

Saved Searches

Once you’ve defined a filter that surfaces important data, you can also save it as a saved search to return to it easily. Saved searches act as monitors that help you keep track of important states in your logs, like negative customer feedback, failed guardrails, or low scores from auto-evaluators. Saved searches can either be:
  • Private: Visible only to you, for personal monitoring and investigation workflows
  • Shared: Visible to your entire team, so everyone can track important metrics and patterns together
Use saved searches when you have queries you run frequently, want to monitor specific conditions over time, or need to share important monitors with your team. Once you’ve created a saved search, you can also use it as the foundation for automations.

Automations

Image Automations build on saved searches to take action on filtered data automatically. Once configured, automations run in the background — continuously monitoring your production traffic and executing actions when logs match your saved search criteria. This allows you to build powerful workflows that ensure the right data gets reviewed, tested, and acted upon without manual effort. Freeplay supports four types of automations:
Rather than manually searching for problematic completions, automations ensure they’re surfaced to your team automatically. Configure which review queue to use, assign specific team members as reviewers (completions are automatically distributed), and set your sampling frequency, limits, and strategy (i.e. random or most recent). This will start adding individual completions or traces to the review queue in the background for you.Example use case: Automatically add guardrailed responses to a review queue so your team can evaluate why guardrails were triggered and identify patterns in edge cases.Screenshot2025 12 16at2 38 40PM
This is particularly useful for building datasets from completions that have been reviewed, validated, or scored highly in production. Select which agent or prompt template and dataset to target, and logs will be automatically added to this dataset. This ensures your test coverage grows organically as your system encounters new scenarios in production.Example use case: Automatically add reviewed completions with high eval scores to a golden dataset for regression testing and prompt optimization.Screenshot 2025 12 16 At 2 39 41 PM
This allows you to run specific evals only on relevant subsets of your data — saving costs and focusing evaluation effort on what matters most.Choose which evaluation criteria to run and set your sampling frequency and limits. This is especially useful for running detailed or expensive evals only on completions or traces that meet certain conditions.Example use case: Run detailed evals only on completions that already passed basic checks, or run specialized evals for specific use cases to understand nuanced quality metrics.Screenshot2025 12 16at2 39 08PM
Stay informed about important patterns or issues in your production data without constantly monitoring the dashboard.Configure your notification channel in Slack, set a sampling frequency, and time period for when notifications should trigger.Example use case: Get notified when completions fail a critical evaluation metric within a time period, or when guardrails are triggered.Screenshot 2025 12 16 At 2 40 51 PM

Creating an Automation

To create an automation, start by defining a saved search in the Observability dashboard. The saved search determines which logs your automation will act on. Once you’ve created a saved search:
  1. Click the “Add automation” button to configure the automation
  2. Give your automation a descriptive name that clearly indicates its purpose (e.g., “Add to Guardrail Review” or “Add to Router Golden Dataset”)
  3. Select the automation type (Review Queue, Dataset, Run Evals, or Notify)
  4. Configure the specific options for your automation type:
    • For review queues: select the queue and assignees
    • For datasets: choose the target prompt or agent and dataset
    • For evals: select the target prompt or agent and which evaluations to run
    • For Slack notifications: configure the channel and thresholds
  5. Set your sampling frequency (hourly, daily, or weekly)
  6. Set the limit for maximum completions per sampling period
  7. Choose your sampling strategy (random or most recent)
  8. Click “Save”
Your automation will now run in the background according to the schedule you configured, continuously processing new completions that match your filter criteria.

Managing Saved Searches and Automations

Screenshot 2025 12 16 At 2 46 09 PM All saved searches and their associated automations are visible in the Observability dashboard. From this view, you can edit automation configurations to adjust frequency, limits, or filter criteria. You can also delete saved searches or automations you no longer need. Monitor automation results regularly to ensure they’re capturing the data you expect and taking the intended actions. You can adjust configurations as your needs evolve or as you learn more about your data patterns.

Best Practices

Start with specific filters: The more targeted your filter, the more useful your saved search or automation will be. Broad filters may capture too much irrelevant data, while specific filters surface exactly what you need to monitor, review, or test. Use descriptive names: Name saved searches and automations clearly so your entire team understands their purpose at a glance (e.g., “Low Score Completions” for a saved search, or “Auto-Add Low Scores to Review” for an automation). Set appropriate limits: Start with conservative sampling limits and adjust based on the volume of matching completions. You can always increase limits if you’re not capturing enough data, but starting too high may overwhelm review queues or datasets. Share important monitors: When you create a saved search that tracks critical metrics or patterns (like guardrail failures or low customer satisfaction), share it with your team so everyone can monitor these conditions together. Combine automation types: Use multiple automations on the same saved search for different purposes. For example, you might both add low-scoring completions to a review queue AND notify your team when they exceed a threshold. Monitor results regularly: Check your saved searches to understand data patterns and verify that automations are capturing the data you expect. Adjust filter criteria or automation configurations as needed based on what you learn.

Common Workflows

Quality Assurance Workflow

Filter for completions with low evaluation scores on critical evaluation criteria. Add an automation to route them to a review queue for further human review. Set up a second automation to notify your team when the volume of low-scoring completions exceeds a threshold, indicating a potential system issue.

Golden Dataset Building

Filter for human-reviewed completions that have high evaluation scores and represent successful interactions. Automatically add them to your golden dataset for regression testing. This ensures your test suite continuously grows with validated, real-world examples.

Guardrail Monitoring

Filter for completions where guardrails were triggered (such as PII detection, toxicity filters, or policy violations). Add an automation to route these to a review queue so your team can analyze why guardrails fired. Set up notifications to alert your security or compliance team when guardrail triggers spike, indicating potential issues.

Cost Optimization

Filter for high-cost agent traces that use excessive tokens. Run additional evaluations on these to assess whether the quality justifies the cost. Route borderline cases to a review queue so your team can determine if optimizations are needed.