Skip to main content
Filters and automations in Freeplay work together to help you monitor, curate, and take action on your production data automatically. Filters let you identify specific completions based on any criteria, while automations perform actions on those filtered results without manual intervention.

Filtering Your Data

Filters in Freeplay allow you to search and narrow down your production data based on virtually any field recorded to Freeplay. Filtering is essential for identifying specific completions or patterns in your system, whether you’re investigating edge cases, building datasets, or analyzing system performance. You can create filters directly in the Observability dashboard by selecting criteria such as:
  • Input and output values
  • Evaluation results and scores
  • Custom metadata (user IDs, session types, versions)
  • Prompt templates/versions
  • Metrics (cost, latency, token usage)
Save filters you use frequently for quick access and share across your team. Once you’ve defined a useful filter, you can apply automations to take action on matching completions automatically.

Automations

Automations take action on filtered data. Once configured, automations run in the background — continuously monitoring your production traffic and executing actions when completions match your filter criteria. This allows you to build powerful workflows that ensure the right data gets reviewed, tested, and acted upon without manual effort. Freeplay supports four types of automations:
Rather than manually searching for problematic completions, automations ensure they’re surfaced to your team automatically. Configure which review queue to use, assign team members as reviewers (completions are automatically distributed), and set your sampling frequency, limits, and strategy (random or most recent). This will start adding completions to the review queue in the background for you!Example use case: Automatically add guardrailed responses to a review queue so your team can evaluate why guardrails were triggered and identify patterns in edge cases.Screenshot2025 12 16at2 38 40PM
This is particularly useful for building datasets from completions that have been reviewed, validated, or scored highly in production. Select which prompt template and dataset to target, and completions will be added to this dataset. This ensures your test coverage grows organically as your system encounters new scenarios in production.Example use case: Automatically add reviewed completions with high eval scores to a golden dataset for regression testing and prompt optimization.Screenshot 2025 12 16 At 2 39 41 PM
This allows you to run specific evals only on relevant subsets of your data, saving costs and focusing evaluation effort on what matters most.Choose which evaluation criteria to run and set your sampling frequency and limits. This is especially useful for running detailed or expensive evals only on completions that meet certain conditions.Example use case: Run detailed evals only on completions that already passed basic checks, or run specialized evals on specific use cases to understand nuanced quality metrics.Screenshot2025 12 16at2 39 08PM
Stay informed about important patterns or issues in your production data without constantly monitoring the dashboard.Configure your notification channel (Slack, email, or other integrations), sampling frequency, and threshold conditions for when notifications should trigger.Example use case: Get notified when a certain number of completions fail a critical evaluation within a time period, or when guardrails are triggered more frequently than expected.Screenshot 2025 12 16 At 2 40 51 PM

Creating an Automation

To create an automation, start by defining or selecting a filter in the Observability dashboard. The filter determines which completions your automation will act on. Once you’ve applied your filter:
  1. Click the “Add automation” button in the filter interface
  2. Give your automation a descriptive name that clearly indicates its purpose (e.g., “Add to Guardrail Review” or “Add to Router Golden Dataset”)
  3. Select the automation type (Review Queue, Dataset, Run Evals, or Notify)
  4. Configure the specific options for your automation type:
    • For review queues: select the queue and assignees
    • For datasets: choose the target prompt and dataset
    • For evals: select the target prompt and which evaluations to run
    • For notifications: configure the channel and thresholds
  5. Set your sampling frequency (hourly, daily, or weekly)
  6. Set the limit for maximum completions per sampling period
  7. Choose your sampling strategy (random or most recent)
  8. Click “Save”
Your automation will now run in the background according to the schedule you configured, continuously processing new completions that match your filter criteria.

Managing Automations

Screenshot 2025 12 16 At 2 46 09 PM All active automations are visible in the Observability dashboard alongside your saved filters. From this view, you can edit automation configurations to adjust frequency, limits, or combined filters. You can also delete automations you no longer need Monitor automation results regularly to ensure they’re capturing the data you expect and taking the intended actions. You can adjust configurations as your needs evolve or as you learn more about your data patterns.

Best Practices

Start with specific filters: The more targeted your filter, the more useful your automation will be. Broad filters may capture too much irrelevant data, while specific filters surface exactly what you need to review or test. Use descriptive names: Name automations clearly so your entire team understands their purpose at a glance (e.g., “Auto-Add-Low-Scores-to-Review” or “Daily-Golden-Dataset-Builder”). Set appropriate limits: Start with conservative sampling limits and adjust based on the volume of matching completions. You can always increase limits if you’re not capturing enough data, but starting too high may overwhelm review queues or datasets. Combine automation types: Use multiple automations on the same filter for different purposes. For example, you might both add low-scoring completions to a review queue AND notify your team when they exceed a threshold. Monitor automation results: Regularly check that automations are capturing the data you expect and that the actions being taken are useful. Adjust configurations as needed based on what you learn.

Common Workflows

Quality Assurance Workflow

Filter for completions with low evaluation scores on critical criteria. Add an automation to route them to a review queue for human evaluation. Set up a second automation to notify your team when the volume of low-scoring completions exceeds a threshold, indicating a potential system issue.

Golden Dataset Building

Filter for reviewed completions that have high evaluation scores and represent successful interactions. Automatically add them to your golden dataset for regression testing. This ensures your test suite continuously grows with validated, real-world examples.

Guardrail Monitoring

Filter for completions where guardrails were triggered (such as PII detection, toxicity filters, or policy violations). Add an automation to route these to a review queue so your team can analyze why guardrails fired. Set up notifications to alert your security or compliance team when guardrail triggers spike, indicating potential issues.

Cost Optimization

Filter for high-cost completions that use excessive tokens. Run additional evaluations on these completions to assess whether the quality justifies the cost. Route borderline cases to a review queue so your team can determine if optimizations are needed.