Build Voice-Enabled AI Applications with Pipecat, Twilio, and Freeplay
Introduction
Voice-enabled AI applications present unique challenges when it comes to testing, monitoring, and iterating on your prompts and models. This guide demonstrates how Freeplay's observability and prompt management tools can support your development workflow when building voice applications.
In this example, we show how to use Freeplay together with Pipecat and Twilio.
What is Pipecat?
Pipecat is a powerful open source framework for building voice-enabled, real-time, multimodal AI applications.
When paired with Twilio for real-time voice over the phone, Pipecat enables teams to quickly build audio-based agentic systems that combine both user and bot audio with LLM interactions.
This combination creates a strong foundation for the core application, but building a high-quality generative AI product also requires robust monitoring, evaluation, and continuous experimentation. This is where Freeplay helps.
Using Freeplay for Rapid Iteration and Observability
When it comes to monitoring and improving a voice agent, teams often struggle with:
- Multi-modal Observability: Tracking and analyzing model inputs and outputs across different data types (audio, text, images, files, etc.)
- Quality Evaluation: Understanding how your application performs in real user scenarios and using evaluation criteria relevant to your product
- Experimentation & Iteration: Systematically versioning, testing, and deploying changes to prompts, tools, and/or models
- Team Collaboration: Keeping all team members on the same page when it comes to testing and understanding quality (including non-developers)
Freeplay addresses these challenges by providing a comprehensive solution for prompt and model management, observability, and evaluation that works seamlessly across modalities/data formats — including audio. And Freeplay makes it easy for both technical and non-technical team members to fully participate in the product development and optimization process.
What You'll Be Able to Monitor
Once implemented, you'll be able to view complete user interactions in Freeplay, including:
- Audio recordings from the user and bot turns
- Transcribed text for easy review and analysis
- LLM responses with full context
- Cost & latency metrics for performance optimization
- Evaluation results against your quality criteria

Once implemented, you'll be able to view complete user interactions in Freeplay, including:
- Audio recordings
- Transcribed text
- LLM responses
- Cost & latency metrics
- Evaluation results
Integration Approaches
Freeplay provides seamless integration with Pipecat to log audio interactions and LLM responses for comprehensive testing and evaluation of your voice agents.
Option 1: Processor Integration
- How it works: Directly intercepts frames within the pipeline processing flow
- Trade-off: Adds minimal latency as processing happens inline
- Best for: Cases where you need direct frame manipulation or synchronous processing
- Documentation: Pipecat Processors
- Full Example: FreeplayProcessor
Option 2: Observer Integration ⭐ Recommended
- How it works: Uses callbacks to log data asynchronously in the background
- Trade-off: Zero impact on pipeline latency since logging happens outside the main flow
- Best for: Voice agents where low latency is critical
- Documentation: Pipecat Observer Pattern
- Full Example: FreeplayObserver
We recommend the Observer pattern: Since latency is crucial for voice agents, the Observer approach ensures your audio pipeline runs at maximum speed while still capturing all necessary data for Freeplay.
Conversation Flow in a Pipecat + Twilio + Freeplay Integration

The above image shows how Freeplay integrates into the pipeline. Regardless if you are using the Observer or Processor methods, the logic is very similar. Once both user and bot audio is received and complete, a turn can be logged to Freeplay. This flow ensures comprehensive logging while maintaining optimal performance for real-time voice interactions.
Implementation Guide FreeplayObserver
Pro Tip: AudioBufferProcessor
We highly recommend using Pipecat's AudioBufferProcessor alongside the FreeplayObserver. This well-tested utility:
- Formats audio data consistently for logging
- Provides reliable callbacks for conversation turn detection
- Simplifies audio handling and storage
This implementation guide uses the Twilio-Chatbot example provided by Pipecat—a chatbot accessed via phone application powered by Twilio. This integration allows you to log audio interactions and LLM responses to Freeplay for comprehensive testing and evaluation of your audio agents.
Prerequisites
Before starting, make sure you have:
- A Freeplay account set up (follow our quick start guide)
- Your prompts configured in Freeplay (follow our prompting guide)
- A working Pipecat + Twilio application
For a complete code example, see our FreeplayObserver Pipecat Recipe.
Step 1: Import Prompt Configuration from Freeplay
First, fetch your prompt configuration from Freeplay and prepare it for use in your Pipecat pipeline:
from helpers.freeplay_frame import FreeplayProcessor # Processor example
from helpers.freeplay_observer import FreeplayObserver # Observer example
from freeplay import Freeplay, SessionInfo
# Freeplay Client
fp_client = Freeplay(
freeplay_api_key=os.getenv("FREEPLAY_API_KEY"),
api_base=os.getenv("FREEPLAY_API_BASE")
)
# Get the unformatted prompt from Freeplay
unformatted_prompt = fp_client.prompts.get(
project_id=os.getenv("FREEPLAY_PROJECT_ID"),
template_name=os.getenv("PROMPT_NAME"),
environment="latest",
)
formatted_prompt = unformatted_prompt.bind(
variables=<optional vars>,
history=[],
).format()
# Pass the formatted prompt to the LLM
llm = OpenAILLMService(model=formatted_prompt.prompt_info.model,
tools=formatted_prompt.tool_schema if formatted_prompt.tool_schema else None,
api_key=os.getenv("OPENAI_API_KEY"),
**formatted_prompt.prompt_info.model_parameters)
Why bind the prompt? This approach allows you to fetch the prompt once and reuse it throughout the conversation, avoiding repeated API calls to Freeplay while still allowing you to add new variables and information at each conversation turn.
Step 2: Create Your Freeplay Observer
Create a FreeplayObserver to handle conversation memory, frame monitoring, and data logging. See the full code here.
class FreeplayObserver(BaseObserver):
def __init__(
self,
fp_client: Freeplay,
unformatted_prompt: str = None,
template_name: str = os.getenv("PROMPT_NAME") or None,
environment: str = "latest",
variables: dict = {},
):
super().__init__()
self.start_llm_interaction = 0
self.end_llm_interaction = 0
self.llm_completion_latency = 0
# Audio related properties
self.sample_width = 2
self.num_channels = 1
self.sample_rate = 16000
self._bot_audio = bytearray()
self._user_audio = bytearray()
self._turn_user_audio = bytearray()
self.user_speaking = False
self.bot_speaking = False
self.fp_client = fp_client
self.session = self.fp_client.sessions.create()
# Freeplay Params
self.template_name = template_name
self.environment = environment
self.unformatted_prompt = unformatted_prompt
self.variables = variables
# Conversation Params
self.conversation_id = self._new_conv_id()
self.total_completion_time = 0
self.conversation_history = []
self.most_recent_user_message = None
self.most_recent_completion = None
def _new_conv_id(self) -> str:
"""Generate a new conversation ID based on the current timestamp (this represents a customer id or similar)."""
return datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
def _reset_recent_messages(self):
"""Reset all temporary message and audio storage."""
self.most_recent_user_message = None
self.most_recent_completion = None
self._user_audio = bytearray()
self._bot_audio = bytearray()
self._turn_user_audio = bytearray()
self.total_completion_time = 0
async def record_to_freeplay(self):
"""Record the current interaction to Freeplay as a new trace."""
# Create a new trace for this interaction
trace = self.session.create_trace(
input=self.most_recent_user_message,
custom_metadata={
"conversation_id": str(self.conversation_id),
},
)
# Add user message to conversation history
self.conversation_history.append(
{
"role": "user",
"content": [
{"type": "text", "text": self.most_recent_user_message},
{
"type": "input_audio",
"input_audio": {
"data": base64.b64encode(self._user_audio).decode("utf-8"),
"format": "wav",
},
},
],
},
)
# Bind the variables to the prompt
if self.unformatted_prompt:
formatted = self.unformatted_prompt.bind(
variables=self.variables,
history=self.conversation_history,
).format()
else:
fp_id = os.getenv("FREEPLAY_PROJECT_ID")
# Get formatted prompt. Note this adds latency to the pipeline
formatted = self.fp_client.prompts.get_formatted(
project_id=os.getenv("FREEPLAY_PROJECT_ID"),
template_name=self.template_name,
environment=self.environment,
history=self.conversation_history,
variables=self.variables,
)
# Calculate latency for the LLM interaction
start, end = self._calc_llm_completion_time()
try:
print(f"_____* self._bot_audio: {len(self._bot_audio)}")
# Prepare metadata and record payload
custom_metadata = {
"total_completion_time": self.total_completion_time,
}
# Prepare assistants response message (mimicing the format of the llm provider message)
assistant_msg = {
"role": "assistant",
"content": [
{"type": "text", "text": self.most_recent_completion},
],
"audio": {
"id": self.conversation_id,
"data": base64.b64encode(self._bot_audio).decode("utf-8"),
"expires_at": 1729234747,
"transcript": self.most_recent_completion,
},
}
# Add assistant's response to conversation history
self.conversation_history.append(assistant_msg)
record = RecordPayload(
all_messages=[
*formatted.llm_prompt,
assistant_msg, # Add the assistant's response to the record call
],
session_info=SessionInfo(
self.session.session_id, custom_metadata=custom_metadata
),
inputs={},
prompt_info=formatted.prompt_info,
call_info=CallInfo.from_prompt_info(formatted.prompt_info, start, end),
trace_info=trace,
)
# Create recording in Freeplay
self.fp_client.recordings.create(record)
# Record output to trace
trace.record_output(
os.getenv("FREEPLAY_PROJECT_ID"),
self.most_recent_completion,
# Optionally
# include call meta data/additional info at the trace level
# Conduct evals and log them to Freeplay
# metadata={...}
)
print(
f"✅ Recorded interaction #{len(self.conversation_history) // 2} to Freeplay - completion time: {self.total_completion_time/1000000}s",
flush=True,
)
# Reset only audio and current message data, keep conversation history
self._reset_recent_messages()
except Exception as e:
print(f"❌ Error recording to Freeplay: {e}", flush=True)
# Still reset audio buffers to prevent accumulation
# audio buffers are overwritten in event handler
self._reset_recent_messages()
async def make_wav_bytes(
self, pcm: bytes, sample_rate, voice, prepend_silence_secs: int = 1
) -> bytes:
"""Convert PCM audio data to WAV format with optional silence prepend."""
temp = io.BytesIO()
if prepend_silence_secs > 0:
silence_samples = int(
self.sample_rate
* self.sample_width
* self.num_channels
* prepend_silence_secs
)
silence = b"\x00" * silence_samples
pcm = silence + pcm
with io.BytesIO() as buf:
with wave.open(buf, "wb") as wf:
wf.setnchannels(1)
wf.setsampwidth(2)
wf.setframerate(sample_rate)
wf.writeframes(pcm)
temp = buf.getvalue()
return temp
async def on_push_frame(self, data: FramePushed):
src = data.source
dst = data.destination
frame = data.frame
direction = data.direction
timestamp = data.timestamp
# Create direction arrow
arrow = "→" if direction == FrameDirection.DOWNSTREAM else "←"
if isinstance(frame, LLMFullResponseStartFrame):
print(f"LLMFullResponseFrame: START {src} {arrow} {dst}", flush=True)
elif isinstance(frame, LLMFullResponseEndFrame):
print(f"LLMFullResponseFrame: END {src} {arrow} {dst}", flush=True)
elif isinstance(frame, TranscriptionFrame):
# Capture user if bot talks first
if self.most_recent_user_message is None:
self.most_recent_user_message = frame.text
elif isinstance(frame, OpenAILLMContextFrame):
messages = frame.context.messages
# Extract user message and completion from context
# NOTE: this replaces the TranscriptionFrame results, as this maps excatly what the llm recived.
user_messages = [m for m in messages if m.get("role") == "user"]
if user_messages:
self.most_recent_user_message = user_messages[-1].get("content")
completions = [m for m in messages if m.get("role") == "assistant"]
if completions:
self.most_recent_completion = completions[-1].get("content")
if (
self.llm_completion_latency
and self.most_recent_user_message
and self.most_recent_completion
):
# reset latency
self.llm_completion_latency = 0
# Get relevant latency metrics for the LLM interaction
if (
isinstance(frame, OpenAILLMContextFrame)
and isinstance(src, LLMUserContextAggregator)
and isinstance(dst, BaseOpenAILLMService)
):
# print(f"<><>....<><> {frame} :: {src} {arrow} {dst} - {timestamp}", flush=True)
# eg. frame: OpenAILLMContextFrame#0 :: OpenAIUserContextAggregator#0 → OpenAILLMService#0
self.start_llm_interaction = timestamp
elif (
isinstance(frame, LLMFullResponseStartFrame)
and isinstance(src, BaseOpenAILLMService)
and isinstance(dst, TTSService)
):
# print(f"<><>____<><> {frame} :: {src} {arrow} {dst} - {timestamp}", flush=True)
# eg. frame: LLMFullResponseStartFrame#0 :: OpenAILLMService#0 → CartesiaTTSService#0
self.end_llm_interaction = timestamp
# update latency tally
self.total_completion_time = (
self.end_llm_interaction - self.start_llm_interaction
)
self.llm_completion_latency += self.total_completion_time
print(
f"_____freeplay-observer.py * set self.total_completion_time: {self.total_completion_time}"
)
def _calc_llm_completion_time(self):
"""
Calculate the start and end times for the LLM completion.
Returns a tuple of (start_time, end_time) in seconds.
"""
return (
time.time(),
time.time() + self.total_completion_time / 1_000_000_000,
)
Step 3: Configure Your Pipeline with Audio Buffering
Set up your Pipecat pipeline with the FreeplayObserver and audio buffering capabilities:
from pipecat.pipeline.task import PipelineTask
from pipecat.pipeline.pipeline import Pipeline
from pipecat.processors.audio.audio_buffer import AudioBufferProcessor
# Create audio buffer for capturing conversation audio
audiobuffer = AudioBufferProcessor(
sample_rate=8000,
num_channels=1,
)
# Configure pipeline task with observer
task = PipelineTask(
pipeline,
params=PipelineParams(
audio_in_sample_rate=8000,
audio_out_sample_rate=8000,
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
),
observers=[freeplay_observer], # Add observer here
)
Step 4: Set Up Audio Capture Callbacks & Configure Pipeline
Configure callbacks to capture audio at the optimal moments:
from helpers.freeplay_observer import FreeplayObserver # Observer example
freeplay_observer = FreeplayObserver(
fp_client=fp_client,
unformatted_prompt=unformatted_prompt,
environment=os.getenv("FREEPLAY_ENVIRONMENT"),
)
#....Additional Pipeline Configuration...
task = PipelineTask(
pipeline,
params=PipelineParams(
audio_in_sample_rate=8000,
audio_out_sample_rate=8000,
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True, # This is used to track the usage of the LLM
),
observers=[
freeplay_observer
], # Use the FreeplayObserver to record the audio to Freeplay
)
# save audio bytes from user and store in freeplay_observer
@audiobuffer.event_handler("on_user_turn_audio_data")
async def on_user_turn_audio_data(buffer, audio, sample_rate, num_channels):
if audio and not freeplay_observer._bot_audio:
# aggregate user audio because this event could fire multiple times
# before bot responds
freeplay_observer._user_audio = freeplay_observer._turn_user_audio.extend(
audio
)
freeplay_observer._user_audio = await freeplay_observer.make_wav_bytes(
freeplay_observer._turn_user_audio,
sample_rate,
"user",
prepend_silence_secs=1,
)
elif audio and freeplay_observer._bot_audio:
freeplay_observer._user_audio = freeplay_observer._turn_user_audio.extend(
audio
)
freeplay_observer._user_audio = await freeplay_observer.make_wav_bytes(
freeplay_observer._turn_user_audio,
sample_rate,
"user",
prepend_silence_secs=1,
)
await freeplay_observer.record_to_freeplay()
# save audio bytes from bot and store in freeplay_observer
@audiobuffer.event_handler("on_bot_turn_audio_data")
async def on_bot_turn_audio_data(buffer, audio, sample_rate, num_channels):
# this assumes the user always speaks first and would cut off
# the first turn of the bot
if audio and not freeplay_observer._user_audio:
# aggregate bot audio because this event could fire multiple times
# before user responds
freeplay_observer._bot_audio = await freeplay_observer.make_wav_bytes(
audio, sample_rate, "bot", prepend_silence_secs=1
)
elif audio and freeplay_observer._user_audio:
freeplay_observer._bot_audio = await freeplay_observer.make_wav_bytes(
audio, sample_rate, "bot", prepend_silence_secs=1
)
await freeplay_observer.record_to_freeplay()
- Add the
FreeplayObserver
to Your Pipeline - Initialize at Conversation Level
The FreeplayObserver must be initialized at the conversation level to properly track the entire interaction flow. - Automatic Frame Processing
As audio frames pass through the observer's on_push_frame method, it automatically updates the processor variables with both user and bot audio data and metadata. - Recording with AudioBufferProcessor Callbacks
To determine the optimal timing for recording to Freeplay, we recommend using AudioBufferProcessor callbacks:on_bot_turn_audio_data
- Captures when the bot completes its audio responseon_user_turn_audio_data
- Captures when the user finishes speaking
These callbacks provide the most reliable trigger points for logging complete conversation turns.
Alternative: FreeplayProcessor Integration
Step 1: Import Your Prompt from Freeplay & Pass to the LLMProcessor
Note, here we get an unformatted prompt from Freeplay and then bind it, this allows us to pass it to the system and not have to make repeated calls to retrieve the llm prompt from Freeplay. The binding allows us to add new variables and information at each turn of the conversation. You can see more here.
from helpers.freeplay_frame import FreeplayProcessor # Processor example
from helpers.freeplay_observer import FreeplayObserver # Observer example
from freeplay import Freeplay, SessionInfo
# Freeplay Client
fp_client = Freeplay(
freeplay_api_key=os.getenv("FREEPLAY_API_KEY"),
api_base=os.getenv("FREEPLAY_API_BASE")
)
# Get the unformatted prompt from Freeplay
unformatted_prompt = fp_client.prompts.get(
project_id=os.getenv("FREEPLAY_PROJECT_ID"),
template_name=os.getenv("PROMPT_NAME"),
environment="latest",
)
formatted_prompt = unformatted_prompt.bind(
variables=<optional vars>,
history=[],
).format()
# Pass the formatted prompt to the LLM
llm = OpenAILLMService(model=formatted_prompt.prompt_info.model,
tools=formatted_prompt.tool_schema if formatted_prompt.tool_schema else None,
api_key=os.getenv("OPENAI_API_KEY"),
**formatted_prompt.prompt_info.model_parameters)
Step 2: Create a Freeplay Processor
The processor handles the memory of the conversation, processing of key frames, and and keeps track of information to log to Freeplay. This inherits from FrameProcessor
in Pipecat. See the full code implementation here.
class FreeplayProcessor(FrameProcessor):
"""Logs LLM interactions and audio to Freeplay with simplified structure."""
def __init__(
self,
fp_client: Freeplay,
template_name: str,
session: SessionInfo = None,
required_information: str = None,
unformatted_prompt: PromptInfo = None,
):
super().__init__()
self.fp_client = fp_client
self.template_name = template_name
self.conversation_id = self._new_conv_id()
self.total_completion_time = 0
self.required_information = required_information
self.deepgram_latency = 0
# Audio related properties
self.sample_width = 2
self.sample_rate = 8000
self.num_channels = 1
self._user_audio = bytearray()
self._bot_audio = bytearray()
self.user_speaking = False
self.bot_speaking = False
# Freeplay related properties
self.conversation_history = []
self.session = session
self.most_recent_user_message = None
self.most_recent_completion = None
self.unformatted_prompt = unformatted_prompt
self.reset_recent_messages()
def _new_conv_id(self) -> str:
"""Generate a new conversation ID based on the current timestamp (this represents a customer id or similar)."""
return datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
def reset_recent_messages(self):
"""Reset all temporary message and audio storage."""
self.most_recent_user_message = None
self.most_recent_completion = None
self._user_audio = bytearray()
self._bot_audio = bytearray()
self.total_completion_time = 0
self.deepgram_latency = 0
async def process_frame(self, frame: Frame, direction: FrameDirection):
"""Process incoming frames and handle Freeplay logging."""
await super().process_frame(frame, direction)
# Handle LLM response frames
if isinstance(frame, (LLMFullResponseStartFrame, LLMFullResponseEndFrame)):
event = "START" if isinstance(frame, LLMFullResponseStartFrame) else "END"
print(f"LLMFullResponseFrame: {event}", flush=True)
# Handle LLM context frame - this is where we log to Freeplay
elif isinstance(frame, OpenAILLMContextFrame):
messages = frame.context.messages
# Extract user message and completion from context
user_messages = [m for m in messages if m.get("role") == "user"]
if user_messages:
self.most_recent_user_message = user_messages[-1].get("content")
completions = [m for m in messages if m.get("role") == "assistant"]
if completions:
self.most_recent_completion = completions[-1].get("content")
# Log to Freeplay when we have both user input and completion
if self.most_recent_user_message and self.most_recent_completion:
self._record_to_freeplay()
# Handle audio state changes
elif isinstance(frame, UserStartedSpeakingFrame):
self.user_speaking = True
elif isinstance(frame, UserStoppedSpeakingFrame):
self.user_speaking = False
elif isinstance(frame, BotStartedSpeakingFrame):
self.bot_speaking = True
elif isinstance(frame, BotStoppedSpeakingFrame):
self.bot_speaking = False
# # Handle audio data
elif isinstance(frame, InputAudioRawFrame):
if self.user_speaking:
self._user_audio.extend(frame.audio)
elif isinstance(frame, TTSAudioRawFrame):
if self.bot_speaking:
self._bot_audio.extend(frame.audio)
# Handle metrics for LLM completion time
elif isinstance(frame, MetricsFrame):
self.metrics = frame.data
for metric in frame.data:
if isinstance(metric, ProcessingMetricsData):
if "LLMService" in metric.processor:
self.total_completion_time = metric.value
elif isinstance(metric, TTFBMetricsData):
if "DeepgramSTTService" in metric.processor:
self.deepgram_latency += metric.value
# Pass frame to next processor
await self.push_frame(frame, direction)
Note: It is required to modify the processes_frame
function in pipecat’s base_llm.py
to pass along the OpenAILLMContext frame, this makes the handling easier in the FreeplayLLMLogger - process_frame
:
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
context = None
if isinstance(frame, OpenAILLMContextFrame):
context: OpenAILLMContext = frame.context
await self.push_frame(frame, direction) # Add this line here to pass frame along
elif isinstance(frame, LLMMessagesFrame):
context = OpenAILLMContext.from_messages(frame.messages)
elif isinstance(frame, VisionImageRawFrame):
context = OpenAILLMContext()
context.add_image_frame_message(
format=frame.format, size=frame.size, image=frame.image, text=frame.text
)
elif isinstance(frame, LLMUpdateSettingsFrame):
await self._update_settings(frame.settings)
else:
await self.push_frame(frame, direction)
....
Step 3: Add the FreeplayProcessor To Your Pipeline
Initialize your FreeplayProcessor
and add it as a step in your pipeline. It is recommended that you add this after the STT
or the audioBuffer
steps in your pipeline so that all of the information needed is available when you log to Freeplay.
# Pass the Freeplay client to the FreeplayProcessor
freeplay_processor = FreeplayProcessor(fp_client=fp_client, template_name="voice-assistant", session=session, debug=True)
#....Additional Pipeline Configuration...
pipeline = Pipeline(
[
transport.input(), # Websocket input from client
stt, # Speech-To-Text
context_aggregator.user(),
llm, # LLM
tts, # Text-To-Speech
freeplay_processor, # Freeplay Logger (after tts so it can capture assistant audio)
transport.output(), # Websocket output to client
audiobuffer, # Used to buffer the audio in the pipeline
context_aggregator.assistant(),
]
)
Step 4: Start logging completions
Begin capturing real user interactions in Freeplay, this is a function of the FreeplayProcessor. The audio is being added to the conversation history for proper tracking.
def _record_to_freeplay(self):
"""Record the current conversation state to Freeplay."""
# Create a new trace for this interaction
trace = self.session.create_trace(
input=self.most_recent_user_message,
custom_metadata={
"deepgram_latency": self.deepgram_latency,
},
)
self.conversation_history.append(
{
"role": "user",
"content": [
{"type": "text", "text": self.most_recent_user_message},
{
"type": "input_audio",
"input_audio": {
"data": base64.b64encode(
self._make_wav_bytes(
self._user_audio, prepend_silence_secs=1
)
).decode("utf-8"),
"format": "wav",
},
},
],
},
)
# Bind the variables to the prompt
if self.unformatted_prompt:
formatted = self.unformatted_prompt.bind(
variables={"required_information": self.required_information},
history=self.conversation_history,
).format()
else:
# Get formatted prompt. Note this adds latency to the pipeline
formatted = self.fp_client.prompts.get_formatted(
project_id=os.getenv("FREEPLAY_PROJECT_ID"),
template_name=self.template_name,
environment="latest",
history=self.conversation_history,
variables={"required_information": self.required_information},
)
# Calculate latency for the LLM interaction
start, end = time.time(), time.time() + self.total_completion_time
try:
# Prepare metadata and record payload
custom_metadata = {
"conversation_id": str(self.conversation_id),
}
# Add assistant's response to conversation history
last_message = {
"role": "assistant",
"content": [
{"type": "text", "text": self.most_recent_completion},
],
"audio": {
"id": self.conversation_id,
"data": base64.b64encode(
self._make_wav_bytes(self._bot_audio, prepend_silence_secs=1)
).decode("utf-8"),
"expires_at": 1729234747,
"transcript": self.most_recent_completion,
},
}
self.conversation_history.append(last_message)
# Create recording in Freeplay
self.fp_client.recordings.create(
RecordPayload(
all_messages=[
*formatted.llm_prompt,
last_message, # Add the last message to the record call
],
session_info=SessionInfo(
self.session.session_id, custom_metadata=custom_metadata
),
inputs={"required_information": self.required_information},
prompt_info=formatted.prompt_info,
call_info=CallInfo.from_prompt_info(
formatted.prompt_info, start, end
),
trace_info=trace,
)
)
# Record output to trace
trace.record_output(
os.getenv("FREEPLAY_PROJECT_ID"),
self.most_recent_completion,
)
print(
f"Successfully recorded to Freeplay - completion time: {self.total_completion_time}s",
flush=True,
)
self.reset_recent_messages()
except Exception as e:
print(f"Error recording to Freeplay: {e}", flush=True)
self.reset_recent_messages()
Updated 10 days ago