Overview
Integrate Freeplay with LangGraph to add observability, prompt management, and evaluation capabilities to your LangGraph applications. This comprehensive guide covers everything from basic setup to advanced agent workflows with state management, streaming, and human-in-the-loop patterns.
Prerequisites
Before you begin, make sure you have:
- A Freeplay account with an active project
- Python 3.10 or higher installed
- Basic familiarity with LangGraph and LangChain
Quick Start with Observability
Installation
Install the Freeplay LangGraph SDK along with your preferred LLM provider. For advanced use, please refer to the documentation on PyPi.
# Install Freeplay SDK
pip install freeplay-langgraph
# Install your LLM provider (choose one or more)
pip install langchain-openai
pip install langchain-anthropic
pip install langchain-google-vertexai
Configuration
Set Up Your Credentials
Configure your Freeplay credentials using environment variables:
export FREEPLAY_API_URL="https://app.freeplay.ai/api"
export FREEPLAY_API_KEY="fp-..."
export FREEPLAY_PROJECT_ID="..."
You can find your API key and Project ID in your Freeplay project settings.
Initialize the SDK
Create a FreeplayLangGraph instance in your application:
from freeplay_langgraph import FreeplayLangGraph
# Using environment variables
freeplay = FreeplayLangGraph()
# Or pass credentials directly
freeplay = FreeplayLangGraph(
freeplay*api_url="https://app.freeplay.ai/api",
freeplay_api_key="fp*...",
project*id="proj*...",
)
With this setup, your LangGraph application is now automatically instrumented with OpenTelemetry, sending traces and spans to Freeplay for observability.
Note: It is recommended to manage your prompts within Freeplay to support
better prompt development lifecycle. Continue following this guide to get your
prompts configured within LangGraph.
Prompt Management
Freeplay’s integration requires that you have your prompts configured in Freeplay. By default, FreeplayLangGraph fetches prompts from the Freeplay API. This requires you to have prompts configured in Freeplay for use. To learn more, see our Prompt Management guide here. Once configured you will need the prompt names for use in the code.
Managing prompts in Freeplay separates your prompt engineering workflow from your LangGraph application. Instead of hardcoding prompts in your agent code, your team can iterate on prompt templates, test different versions , new models and deploy changes through Freeplay without modifying or redeploying your LangGraph application. This enables your team to test agent behavior, maintain different prompt versions across environments (development, staging, production), and experiment with variations.
Optional - Prompt Bundling
Once your prompts are saved in Freeplay, you can use bundled prompts stored locally with your application, you can provide a custom template resolver:
from pathlib import Path
from freeplay.resources.prompts import FilesystemTemplateResolver
from freeplay_langgraph import FreeplayLangGraph
# Use filesystem-based prompts bundled with your app
freeplay = FreeplayLangGraph(
template_resolver=FilesystemTemplateResolver(Path("bundled_prompts"))
)
This is useful for offline environments, testing, or when you want to version control your prompts alongside your code. See our Prompt Bundling Guide to learn more.
Core Concepts
Freeplay provides two primary ways to work with LangGraph:
create_agent() - For building full LangGraph agents with tool calling, ReAct loops, and state management
invoke() - For simple, stateless LLM invocations when you don’t need agent capabilities
Both methods support the same core features: conversation history, tool calling, structured outputs and running tests. Choose create_agent() when you need the full power of LangGraph’s agent framework, and invoke() for simpler use cases.
Building LangGraph Agents
The create_agent method provides full support for LangGraph’s agent capabilities including the ReAct loop, tool calling, state management, middleware, and streaming.
Basic Agent Creation
Create an agent that uses a Freeplay-hosted prompt with automatic model instantiation. You have the ability to pass variables at the creation and invocation of the agent, both are optional depending on your flow:
from freeplay_langgraph import FreeplayLangGraph
from langchain_core.messages import HumanMessage
freeplay = FreeplayLangGraph()
# Create a basic agent with a prmopt stored in Freeplay
agent = freeplay.create_agent(
prompt_name="weather-assistant",
variables={"location": "San Francisco"}, # Optional, enables datasets & testing
environment="production"
)
# Invoke the agent
result = agent.invoke({
"messages": [HumanMessage(content="What's the weather like today?")],
"variables": {{"location": "Denver"}
})
print(result["messages"][-1].content)
Using create_agent gives you access to LangGraph’s full agent capabilities, including tool calling with the ReAct loop, state persistence, and advanced execution control.
Bind LangChain tools to your agent for agentic workflows. The agent automatically decides when to call tools:
from langchain_core.tools import tool
@tool
def get_weather(city: str) -> str:
"""Get the current weather for a city."""
return f"Weather in {city}: Sunny, 72°F"
@tool
def get_forecast(city: str, days: int) -> str:
"""Get the weather forecast for a city."""
return f"{days}-day forecast for {city}: Mostly sunny"
agent = freeplay.create_agent(
prompt_name="weather-assistant",
variables={"location": "San Francisco"},
tools=[get_weather, get_forecast],
environment="production"
)
result = agent.invoke({
"messages": [HumanMessage(content="What's the weather in SF and the 5-day forecast?")]
})
The agent handles the tool-calling cycle through LangGraph’s ReAct loop, deciding when to use tools and when to respond directly to the user.
Conversation History
Maintain conversation context across multiple turns with conversation history:
from langchain_core.messages import HumanMessage, AIMessage
# Build conversation history
history = [
HumanMessage(content="What's the weather in Paris?"),
AIMessage(content="It's sunny and 22°C in Paris."),
HumanMessage(content="What about in winter?")
]
agent = freeplay.create_agent(
prompt_name="weather-assistant",
variables={"city": "Paris"},
tools=[get_weather],
environment="production"
)
# Pass history in the messages
result = agent.invoke({
"messages": history + [HumanMessage(content="And the average rainfall?")]
})
For persistent conversations across multiple invocations, use state persistence with checkpointers (covered in State Management section).
Structured Output
Get structured, typed responses from your agents using ToolStrategy or ProviderStrategy:
from pydantic import BaseModel
from langchain.agents.structured_output import ToolStrategy
class WeatherReport(BaseModel):
city: str
temperature: float
conditions: str
humidity: int
agent = freeplay.create_agent(
prompt_name="weather-assistant",
variables={"format": "detailed"},
tools=[get_weather_data],
response_format=ToolStrategy(WeatherReport)
)
result = agent.invoke({
"messages": [HumanMessage(content="Get weather for New York City")]
})
# Access strongly-typed structured output
weather_report = result["structured_response"]
print(f"{weather_report.city}: {weather_report.temperature}°F")
print(f"Conditions: {weather_report.conditions}, Humidity: {weather_report.humidity}%")
Structured output ensures your agent returns data in a predictable format, making it easier to integrate with downstream systems, databases, or UIs.
from typing import cast
from langgraph.graph.state import CompiledStateGraph
agent = freeplay.create_agent(...)
# Option 1: Direct unwrap (works at runtime)
state = agent.unwrap().get_state(config)
# Option 2: Cast for full type hints
compiled = cast(CompiledStateGraph, agent.unwrap())
state = compiled.get_state(config) # ✅ Full IDE autocomplete
Automatic Observability
Once initialized, the Freeplay SDK automatically instruments your LangGraph application with OpenTelemetry. This means every LangChain and LangGraph operation is traced and sent to Freeplay without any additional code.
What Gets Tracked
Freeplay automatically captures:
- Prompt invocations: Template, variables, and generated content
- Model calls: Provider, model name, tokens used, latency
- Tool executions: Which tools were called and their results
- Agent flows: Multi-step reasoning and decision paths
- Conversation flows: Multi-turn interactions and state transitions
- Errors and exceptions: Failed invocations with stack traces
- Metadata: Test run IDs, test case IDs, environment names, and custom tags
All metadata is injected automatically through LangChain’s RunnableBindingBase pattern, ensuring comprehensive observability without manual instrumentation.
Viewing Traces
You can view all of this data in the Freeplay dashboard, making it easy to:
- Debug issues and understand failure patterns
- Optimize performance and reduce latency
- Understand how your application behaves in production
- Track token usage and costs across environments
- Measure impact of prompt changes over time
Simple Prompt Invocations
For simpler use cases that don’t require the full agent loop, use the invoke method. This is ideal for one-off completions, quick classifications, or any scenario where you don’t need agent state management or the ReAct loop.
Basic Invocation
Call a Freeplay-hosted prompt with automatic model instantiation:
from freeplay_langgraph import FreeplayLangGraph
freeplay = FreeplayLangGraph()
# Invoke a prompt - model is automatically created based on Freeplay's config
response = freeplay.invoke(
prompt_name="sentiment-analyzer",
variables={"text": "This product exceeded my expectations!"},
environment="production"
)
print(response.content)
Using invoke gives you quick access to Freeplay-managed prompts without the overhead of agent state or tool calling. This is perfect for classification tasks, content generation, or any stateless LLM operation.
Bind LangChain tools for basic tool calling without the full agent loop:
from langchain_core.tools import tool
@tool
def calculate_discount(price: float, discount_percent: float) -> float:
"""Calculate the final price after applying a discount."""
return price \* (1 - discount_percent / 100)
@tool
def check_inventory(product_id: str) -> int:
"""Check inventory levels for a product."""
return 42 # Mock inventory count
response = freeplay.invoke(
prompt_name="pricing-assistant",
variables={"product": "laptop", "base_price": 1200},
tools=[calculate_discount, check_inventory]
)
Conversation History
Maintain conversation context across multiple turns:
from langchain_core.messages import HumanMessage, AIMessage
# Build conversation history
history = [
HumanMessage(content="What's the weather in Paris?"),
AIMessage(content="It's sunny and 22°C in Paris."),
HumanMessage(content="What about in winter?")
]
# The prompt has full context of the conversation
response = freeplay.invoke(
prompt_name="weather-assistant",
variables={"city": "Paris"},
history=history
)
print(response.content)
By passing conversation history, your prompts can maintain context across multiple turns without needing full agent state management.
Test Execution Tracking
Track test runs for evaluation workflows by pulling test cases from Freeplay and executing them with automatic tracking. By associating invocations with test runs and test cases, you can analyze performance across your test suite, identify regressions, and measure the impact of prompt changes in Freeplay’s evaluation dashboard. See more about running end to end test runs here.
Creating Test Runs
import os
from freeplay_langgraph import FreeplayLangGraph
from langchain_core.messages import HumanMessage
freeplay = FreeplayLangGraph()
# Create a test run from a dataset
test_run = freeplay.client.test_runs.create(
project_id=os.getenv("FREEPLAY_PROJECT_ID"),
testlist="name of the dataset",
name="name your test run",
)
print(f"Created test run: {test_run.id}")
Executing Test Cases with Simple Invocations
For simple prompt invocations, use the test tracking parameters directly:
# Execute each test case
for test_case in test_run.test_cases:
response = freeplay.invoke(
prompt_name="my-prompt",
variables=test_case.variables,
test_run_id=test_run.id,
test_case_id=test_case.id
)
print(f"Test case {test_case.id}: {response.content}")
Executing Test Cases with Agents
For LangGraph agents, pass test tracking metadata via config to reuse the agent efficiently:
from langchain_core.messages import HumanMessage
# Create agent once (no test tracking at creation)
agent = freeplay.create_agent(
prompt_name="my-prompt",
variables={"input": "prompt input"},
tools=[get_weather],
)
# Execute each test case with metadata override
for test_case in test_run.trace_test_cases:
result = agent.invoke(
{"messages": [HumanMessage(content=test_case.input)]},
config={
"metadata": {
"freeplay.test_run_id": test_run.id,
"freeplay.test_case_id": test_case.id
}
}
)
print(f"Test case {test_case.id}: {result['messages'][-1].content}")
Using Custom Models
Provide your own pre-configured LangChain model for more control:
from langchain_openai import ChatOpenAI
# Configure your own model with custom parameters
model = ChatOpenAI(
model="gpt-4",
temperature=0.7,
max_tokens=1000
)
response = freeplay.invoke(
prompt_name="content-generator",
variables={"topic": "sustainable energy"},
model=model
)
Async Support
All methods in the Freeplay SDK support async/await for better performance in async applications:
Async Agent Invocation
# Async agent creation and invocation
agent = freeplay.create_agent(
prompt_name="assistant",
variables={"role": "helpful"},
tools=[search_knowledge_base]
)
result = await agent.ainvoke({
"messages": [HumanMessage(content="Help me find information")]
})
Async Simple Invocations
# Async invocation
response = await freeplay.ainvoke(
prompt_name="sentiment-analyzer",
variables={"text": "Great product!"}
)
# Async streaming
async for chunk in freeplay.astream(
prompt_name="content-generator",
variables={"topic": "machine learning"}
):
print(chunk.content, end="", flush=True)
Async State Management
# Async state inspection
state = await agent.unwrap().aget_state(config)
# Async state updates
await agent.unwrap().aupdate_state(config, {"approval": "granted"})
Using async methods improves throughput and reduces latency in applications that handle multiple concurrent requests, such as web servers or API endpoints.
Automatic Observability
Once initialized, the Freeplay SDK automatically instruments your LangGraph application with OpenTelemetry. This means every LangChain and LangGraph operation is traced and sent to Freeplay without any additional code.
What Gets Tracked
Freeplay automatically captures:
- Prompt invocations: Template, variables, and generated content
- Model calls: Provider, model name, tokens used, latency
- Tool executions: Which tools were called and their results
- Agent flows: Multi-step reasoning and decision paths
- Conversation flows: Multi-turn interactions and state transitions
- Errors and exceptions: Failed invocations with stack traces
- Metadata: Test run IDs, test case IDs, environment names, and custom tags
All metadata is injected automatically through LangChain’s RunnableBindingBase pattern, ensuring comprehensive observability without manual instrumentation.
Viewing Traces
You can view all of this data in the Freeplay dashboard, making it easy to:
- Debug issues and understand failure patterns
- Optimize performance and reduce latency
- Understand how your application behaves in production
- Track token usage and costs across environments
- Measure impact of prompt changes over time
Supported LLM Providers
Freeplay’s LangGraph SDK supports automatic model instantiation for multiple providers. Install the corresponding LangChain integration package for your provider:
OpenAI
pip install langchain-openai
Anthropic
pip install langchain-anthropic
Vertex AI (Google)
pip install langchain-google-vertexai
The SDK automatically detects which provider your Freeplay prompt is configured to use and instantiates the appropriate model with the correct parameters.