Build Voice-Enabled AI Applications with Pipecat, Twilio, and Freeplay
Introduction
Voice-enabled AI applications present unique challenges when it comes to testing, monitoring, and iterating on your prompts and models. This guide demonstrates how Freeplay's observability and prompt management tools can support your development workflow when building voice applications.
In this example, we show how to use Freeplay together with Pipecat and Twilio.
What is Pipecat?
Pipecat is a powerful open source framework for building voice-enabled, real-time, multimodal AI applications.
When paired with Twilio for real-time voice over the phone, Pipecat enables teams to quickly build audio-based agentic systems that combine both user and bot audio with LLM interactions.
This combination creates a strong foundation for the core application, but building a high-quality generative AI product also requires robust monitoring, evaluation, and continuous experimentation. This is where Freeplay helps.
Using Freeplay for Rapid Iteration and Observability
When it comes to monitoring and improving a voice agent, teams often struggle with:
- Multi-modal Observability: Tracking and analyzing model inputs and outputs across different data types (audio, text, images, files, etc.)
- Quality Evaluation: Understanding how your application performs in real user scenarios and using evaluation criteria relevant to your product
- Experimentation & Iteration: Systematically versioning, testing, and deploying changes to prompts, tools, and/or models
- Team Collaboration: Keeping all team members on the same page when it comes to testing and understanding quality (including non-developers)
Freeplay addresses these challenges by providing a comprehensive solution for prompt and model management, observability, and evaluation that works seamlessly across modalities/data formats — including audio. And Freeplay makes it easy for both technical and non-technical team members to fully participate in the product development and optimization process.
Example Conversation View
In the image below, we see a conversation help between the voice assistant and one of our team members. The conversation audio and text are recorded to Freeplay, note that the transcript of the recorded audio is included with the audio recording:

Once implemented, you'll be able to view complete user interactions in Freeplay, including:
- Audio recordings
- Transcribed text
- LLM responses
- Cost & latency metrics
- Evaluation results
Integration Architecture
Freeplay integrates well with Pipecat. Pipecat provides the LLM context and data needed to pass into Freeplay, and then the integration allows your team to handle the LLM iteration, testing and completion logging in Freeplay.
To integrate Freeplay with your Pipecat and Twilio application, we recommend creating a Freeplay logging “Processor” (Pipecat service) like the example below. In Pipecat, Processors serve as workers that process “frames.” Frames hold information and are passed along to processors like an assembly line.
The Freeplay Processor focuses on:
- Fetching prompt configuration from Freeplay
- Collecting information from the relevant Pipecat frames (user input and LLM completions)
- Logging the LLM completions to Freeplay
Information Flow in a Pipecat + Twilio + Freeplay Integration

This example integration uses the Twilio-Chatbot example provided by Pipecat. The application is a chatbot that is used via a phone application powered by Twilio. (This can also be done using streaming rather than speech-to-text conversions if needed!).
Getting Started
First, if you are new to Freeplay, we suggest following our quick start guide, once you have your account, prompts, and everything set up, take the following steps. For a full code integration, see our Pipecat Integration recipe which shows the full Freeplay processor implementation.
- Import your prompt config from Freeplay & pass to the LLM processor in Pipecat
from helpers.freeplay_frame import FreeplayLLMLogger
from freeplay import Freeplay, SessionInfo
# Freeplay Client
fp_client = Freeplay(
freeplay_api_key=os.getenv("FREEPLAY_API_KEY"),
api_base=os.getenv("FREEPLAY_API_BASE")
)
# Get the formatted prompt from Freeplay
formatted_prompt = fp_client.prompts.get_formatted(
project_id=os.getenv("FREEPLAY_PROJECT_ID"),
template_name="voice-assistant",
environment="latest",
variables={},
history=[]
)
# Pass the formatted prompt to the LLM
llm = OpenAILLMService(model=formatted_prompt.prompt_info.model,
tools=formatted_prompt.tool_schema if formatted_prompt.tool_schema else None,
api_key=os.getenv("OPENAI_API_KEY"),
**formatted_prompt.prompt_info.model_parameters)
- Create a Freeplay Processor in Pipecat: The processor handles the memory of the conversation, processing of key frames, and and keeps track of information to log to Freeplay.
class FreeplayLLMLogger(FrameProcessor):
"""Logs LLM interactions and audio to Freeplay with simplified structure."""
def __init__(
self,
fp_client: Freeplay,
template_name: str,
session: SessionInfo = None,
debug: bool = True,
):
super().__init__()
self.fp_client = fp_client
self.template_name = template_name
self.conversation_id = self._new_conv_id()
self.total_completion_time = 0
# Audio related properties
self.sample_width = 2
self.sample_rate = 8000
self.num_channels = 1
self._user_audio = bytearray()
self._bot_audio = bytearray()
self.user_speaking = False
self.bot_speaking = False
# Freeplay related properties
self.conversation_history = []
self.session = session
self.most_recent_user_message = None
self.most_recent_completion = None
self.reset_recent_messages()
def _new_conv_id(self) -> str:
"""Generate a new conversation ID based on the current timestamp (this represents a customer id or similar)."""
return datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
def reset_recent_messages(self):
"""Reset all temporary message and audio storage."""
self.most_recent_user_message = None
self.most_recent_completion = None
self._user_audio = bytearray()
self._bot_audio = bytearray()
self.total_completion_time = 0
async def process_frame(self, frame: Frame, direction: FrameDirection):
"""Process incoming frames and handle Freeplay logging."""
await super().process_frame(frame, direction)
# Handle LLM response frames
if isinstance(frame, (LLMFullResponseStartFrame, LLMFullResponseEndFrame)):
event = "START" if isinstance(frame, LLMFullResponseStartFrame) else "END"
print(f"LLMFullResponseFrame: {event}", flush=True)
# Handle LLM context frame - this is where we log to Freeplay
elif isinstance(frame, OpenAILLMContextFrame):
messages = frame.context.messages
# Extract user message and completion from context
user_messages = [m for m in messages if m.get("role") == "user"]
if user_messages:
self.most_recent_user_message = user_messages[-1].get("content")
completions = [m for m in messages if m.get("role") == "assistant"]
if completions:
self.most_recent_completion = completions[-1].get("content")
# Log to Freeplay when we have both user input and completion
if self.most_recent_user_message and self.most_recent_completion:
self._record_to_freeplay()
# Handle audio state changes
elif isinstance(frame, UserStartedSpeakingFrame):
self.user_speaking = True
elif isinstance(frame, UserStoppedSpeakingFrame):
self.user_speaking = False
elif isinstance(frame, BotStartedSpeakingFrame):
self.bot_speaking = True
elif isinstance(frame, BotStoppedSpeakingFrame):
self.bot_speaking = False
# Handle audio data
elif isinstance(frame, InputAudioRawFrame):
if self.user_speaking:
self._user_audio.extend(frame.audio)
elif self.bot_speaking:
self._bot_audio.extend(frame.audio)
# Handle metrics for completion time
elif isinstance(frame, MetricsFrame):
for metric in frame.data:
if isinstance(metric, ProcessingMetricsData):
if "LLMService" in metric.processor:
self.total_completion_time = metric.value
# Pass frame to next processor
await self.push_frame(frame, direction)
Note: It is required to modify the processes_frame
function in pipecat’s base_llm.py
to pass along the OpenAILLMContext frame, this makes the handling easier in the FreeplayLLMLogger - process_frame
:
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
context = None
if isinstance(frame, OpenAILLMContextFrame):
context: OpenAILLMContext = frame.context
await self.push_frame(frame, direction) # Add this line here to pass frame along
elif isinstance(frame, LLMMessagesFrame):
context = OpenAILLMContext.from_messages(frame.messages)
elif isinstance(frame, VisionImageRawFrame):
context = OpenAILLMContext()
context.add_image_frame_message(
format=frame.format, size=frame.size, image=frame.image, text=frame.text
)
elif isinstance(frame, LLMUpdateSettingsFrame):
await self._update_settings(frame.settings)
else:
await self.push_frame(frame, direction)
....
- Start logging completions: Begin capturing real user interactions in Freeplay
def _record_to_freeplay(self):
"""Record the current conversation state to Freeplay."""
# Create a new trace for this interaction
trace = self.session.create_trace(input=self.most_recent_user_message)
# Get formatted prompt with conversation history
formatted = self.fp_client.prompts.get_formatted(
project_id=os.getenv('FREEPLAY_PROJECT_ID'),
template_name=self.template_name,
environment='latest',
history=self.conversation_history,
variables={"user_input": self.most_recent_user_message},
)
# Calculate latency for the LLM interaction
start, end = time.time(), time.time() + self.total_completion_time
# Prepare messages for recording
all_msgs = formatted.all_messages({"role": "assistant", "content": self.most_recent_completion})
try:
# Prepare metadata and record payload
custom_metadata = {
'conversation_id': str(self.conversation_id),
'completion_time': self.total_completion_time
}
record = RecordPayload(
all_messages=all_msgs,
session_info=SessionInfo(self.session.session_id, custom_metadata=custom_metadata),
inputs={'user_input': self.most_recent_user_message} if self.most_recent_user_message else {},
prompt_info=formatted.prompt_info,
call_info=CallInfo.from_prompt_info(formatted.prompt_info, start, end),
response_info=ResponseInfo(is_complete=True),
trace_info=trace,
)
# Create recording in Freeplay
self.fp_client.recordings.create(record)
# Add assistant's response to conversation history
self.conversation_history.append({
"role": "assistant",
"content": [{"type": "text", "text": self.most_recent_completion}]
})
# Update conversation history with user message and audio
self.conversation_history.append({
"role": "user",
"content": [
{"type": "text", "text": self.most_recent_user_message},
{
'type': 'input_audio',
'input_audio': {
'data': base64.b64encode(self._make_wav_bytes(self._user_audio, prepend_silence_secs=1)).decode('utf-8'),
'format': "wav"
}
}
]
})
# Record output to trace
trace.record_output(
os.getenv('FREEPLAY_PROJECT_ID'),
self.most_recent_completion,
# Optionally include call meta data/additional info at the trace level
# metadata={...}
)
print(f"Successfully recorded to Freeplay - completion time: {self.total_completion_time}s", flush=True)
self.reset_recent_messages()
except Exception as e:
print(f"Error recording to Freeplay: {e}", flush=True)
self.reset_recent_messages()
Updated 6 days ago