LangGraph Integration Guide

Integrate Freeplay with LangGraph to add observability, prompt management, and evaluation capabilities to your LangGraph applications. This comprehensive guide covers everything from basic setup to advanced agent workflows with state management, streaming, and human-in-the-loop patterns.

Prerequisites

Before you begin, make sure you have:

A Freeplay account with an active project
Python 3.10 or higher installed
Basic familiarity with LangGraph and LangChain

Quick Start with Observability

Installation

Install the Freeplay LangGraph SDK along with your preferred LLM provider. For advanced use, please refer to the documentation on PyPi.

# Install Freeplay SDK
pip install freeplay-langgraph

# Install your LLM provider (choose one or more)
pip install langchain-openai
pip install langchain-anthropic
pip install langchain-google-vertexai

Configuration

Set Up Your Credentials

Configure your Freeplay credentials using environment variables:

export FREEPLAY_API_URL="https://app.freeplay.ai/api"
export FREEPLAY_API_KEY="fp-..."
export FREEPLAY_PROJECT_ID="..."

You can find your API key and Project ID in your Freeplay project settings.

Initialize the SDK

Create a FreeplayLangGraph instance in your application:

from freeplay_langgraph import FreeplayLangGraph

# Using environment variables
freeplay = FreeplayLangGraph()

# Or pass credentials directly
freeplay = FreeplayLangGraph(
    freeplay_api_url="https://app.freeplay.ai/api",
    freeplay_api_key="fp_...",
    project_id="proj_...",
)

With this setup, your LangGraph application is now automatically instrumented with OpenTelemetry, sending traces and spans to Freeplay for observability.

📘
Note: It is recommended to manage your prompts within Freeplay to support better prompt development lifecycle. Continue following this guide to get your prompts configured within LangGraph.

Prompt Management

Freeplay's integration requires that you have your prompts configured in Freeplay. By default, FreeplayLangGraph fetches prompts from the Freeplay API. This requires you to have prompts configured in Freeplay for use. To learn more, see our Prompt Management guide here. Once configured you will need the prompt names for use in the code.

Managing prompts in Freeplay separates your prompt engineering workflow from your LangGraph application. Instead of hardcoding prompts in your agent code, your team can iterate on prompt templates, test different versions , new models and deploy changes through Freeplay without modifying or redeploying your LangGraph application. This enables your team to test agent behavior, maintain different prompt versions across environments (development, staging, production), and experiment with variations.

Optional - Prompt Bundling

Once your prompts are saved in Freeplay, you can use bundled prompts stored locally with your application, you can provide a custom template resolver:

from pathlib import Path
from freeplay.resources.prompts import FilesystemTemplateResolver
from freeplay_langgraph import FreeplayLangGraph

# Use filesystem-based prompts bundled with your app
freeplay = FreeplayLangGraph(
    template_resolver=FilesystemTemplateResolver(Path("bundled_prompts"))
)

This is useful for offline environments, testing, or when you want to version control your prompts alongside your code. See our Prompt Bundling Guide to learn more.

Core Concepts

Freeplay provides two primary ways to work with LangGraph:

create_agent() - For building full LangGraph agents with tool calling, ReAct loops, and state management
invoke() - For simple, stateless LLM invocations when you don't need agent capabilities

Both methods support the same core features: conversation history, tool calling, structured outputs and running tests. Choose create_agent() when you need the full power of LangGraph's agent framework, and invoke() for simpler use cases.

Building LangGraph Agents

The create_agent method provides full support for LangGraph's agent capabilities including the ReAct loop, tool calling, state management, middleware, and streaming.

Basic Agent Creation

Create an agent that uses a Freeplay-hosted prompt with automatic model instantiation:

from freeplay_langgraph import FreeplayLangGraph
from langchain_core.messages import HumanMessage

freeplay = FreeplayLangGraph()

# Create a basic agent with Freeplay prompt
agent = freeplay.create_agent(
    prompt_name="weather-assistant",
    variables={"location": "San Francisco"},
    environment="production"
)

# Invoke the agent
result = agent.invoke({
    "messages": [HumanMessage(content="What's the weather like today?")]
})

print(result["messages"][-1].content)

Using create_agent gives you access to LangGraph's full agent capabilities, including tool calling with the ReAct loop, state persistence, and advanced execution control.

Adding Tools

Bind LangChain tools to your agent for agentic workflows. The agent automatically decides when to call tools:

from langchain_core.tools import tool

@tool
def get_weather(city: str) -> str:
    """Get the current weather for a city."""
    return f"Weather in {city}: Sunny, 72°F"

@tool
def get_forecast(city: str, days: int) -> str:
    """Get the weather forecast for a city."""
    return f"{days}-day forecast for {city}: Mostly sunny"

agent = freeplay.create_agent(
    prompt_name="weather-assistant",
    variables={"location": "San Francisco"},
    tools=[get_weather, get_forecast],
    environment="production"
)

result = agent.invoke({
    "messages": [HumanMessage(content="What's the weather in SF and the 5-day forecast?")]
})

The agent handles the tool-calling cycle through LangGraph's ReAct loop, deciding when to use tools and when to respond directly to the user.

Conversation History

Maintain conversation context across multiple turns with conversation history:

from langchain_core.messages import HumanMessage, AIMessage

# Build conversation history
history = [
    HumanMessage(content="What's the weather in Paris?"),
    AIMessage(content="It's sunny and 22°C in Paris."),
    HumanMessage(content="What about in winter?")
]

agent = freeplay.create_agent(
    prompt_name="weather-assistant",
    variables={"city": "Paris"},
    tools=[get_weather],
    environment="production"
)

# Pass history in the messages
result = agent.invoke({
    "messages": history + [HumanMessage(content="And the average rainfall?")]
})

For persistent conversations across multiple invocations, use state persistence with checkpointers (covered in State Management section).

Structured Output

Get structured, typed responses from your agents using ToolStrategy or ProviderStrategy:

from pydantic import BaseModel
from langchain.agents.structured_output import ToolStrategy

class WeatherReport(BaseModel):
    city: str
    temperature: float
    conditions: str
    humidity: int

agent = freeplay.create_agent(
    prompt_name="weather-assistant",
    variables={"format": "detailed"},
    tools=[get_weather_data],
    response_format=ToolStrategy(WeatherReport)
)

result = agent.invoke({
    "messages": [HumanMessage(content="Get weather for New York City")]
})

# Access strongly-typed structured output
weather_report = result["structured_response"]
print(f"{weather_report.city}: {weather_report.temperature}°F")
print(f"Conditions: {weather_report.conditions}, Humidity: {weather_report.humidity}%")

Structured output ensures your agent returns data in a predictable format, making it easier to integrate with downstream systems, databases, or UIs.

Running Tests via the SDK

Track test runs and test cases for systematic evaluation:

agent = freeplay.create_agent(
    prompt_name="customer-support-agent",
    variables={"company": "Acme Corp"},
    tools=[check_account_status],
    test_run_id="test_run_123", # Pass the test run info
    test_case_id="test_case_456",
    environment="staging"
)

result = agent.invoke({
    "messages": [HumanMessage(content="Check my account status")]
})

By associating invocations with test runs and test cases, you can analyze performance across your test suite, identify regressions, and measure the impact of prompt changes in Freeplay's evaluation dashboard. See more about running end to end test runs here.

from typing import cast
from langgraph.graph.state import CompiledStateGraph

agent = freeplay.create_agent(...)

# Option 1: Direct unwrap (works at runtime)
state = agent.unwrap().get_state(config)

# Option 2: Cast for full type hints
compiled = cast(CompiledStateGraph, agent.unwrap())
state = compiled.get_state(config)  # ✅ Full IDE autocomplete

Automatic Observability

Once initialized, the Freeplay SDK automatically instruments your LangGraph application with OpenTelemetry. This means every LangChain and LangGraph operation is traced and sent to Freeplay without any additional code.

What Gets Tracked

Freeplay automatically captures:

Prompt invocations: Template, variables, and generated content
Model calls: Provider, model name, tokens used, latency
Tool executions: Which tools were called and their results
Agent flows: Multi-step reasoning and decision paths
Conversation flows: Multi-turn interactions and state transitions
Errors and exceptions: Failed invocations with stack traces
Metadata: Test run IDs, test case IDs, environment names, and custom tags

All metadata is injected automatically through LangChain's RunnableBindingBase pattern, ensuring comprehensive observability without manual instrumentation.

Viewing Traces

You can view all of this data in the Freeplay dashboard, making it easy to:

Debug issues and understand failure patterns
Optimize performance and reduce latency
Understand how your application behaves in production
Track token usage and costs across environments
Measure impact of prompt changes over time

Simple Prompt Invocations

For simpler use cases that don't require the full agent loop, use the invoke method. This is ideal for one-off completions, quick classifications, or any scenario where you don't need agent state management or the ReAct loop.

Basic Invocation

Call a Freeplay-hosted prompt with automatic model instantiation:

from freeplay_langgraph import FreeplayLangGraph

freeplay = FreeplayLangGraph()

# Invoke a prompt - model is automatically created based on Freeplay's config
response = freeplay.invoke(
    prompt_name="sentiment-analyzer",
    variables={"text": "This product exceeded my expectations!"},
    environment="production"
)

print(response.content)

Using invoke gives you quick access to Freeplay-managed prompts without the overhead of agent state or tool calling. This is perfect for classification tasks, content generation, or any stateless LLM operation.

Adding Tools

Bind LangChain tools for basic tool calling without the full agent loop:

from langchain_core.tools import tool

@tool
def calculate_discount(price: float, discount_percent: float) -> float:
    """Calculate the final price after applying a discount."""
    return price * (1 - discount_percent / 100)

@tool
def check_inventory(product_id: str) -> int:
    """Check inventory levels for a product."""
    return 42  # Mock inventory count

response = freeplay.invoke(
    prompt_name="pricing-assistant",
    variables={"product": "laptop", "base_price": 1200},
    tools=[calculate_discount, check_inventory]
)

Conversation History

Maintain conversation context across multiple turns:

from langchain_core.messages import HumanMessage, AIMessage

# Build conversation history
history = [
    HumanMessage(content="What's the weather in Paris?"),
    AIMessage(content="It's sunny and 22°C in Paris."),
    HumanMessage(content="What about in winter?")
]

# The prompt has full context of the conversation
response = freeplay.invoke(
    prompt_name="weather-assistant",
    variables={"city": "Paris"},
    history=history
)

print(response.content)

By passing conversation history, your prompts can maintain context across multiple turns without needing full agent state management.

Test Execution Tracking

Track test runs for systematic evaluation:

# Track simple invocations for evaluation
response = freeplay.invoke(
    prompt_name="sentiment-classifier",
    variables={"text": "I love this product!"},
    test_run_id="test_run_789",
    test_case_id="test_case_012",
    environment="staging"
)

Using Custom Models

Provide your own pre-configured LangChain model for more control:

from langchain_openai import ChatOpenAI

# Configure your own model with custom parameters
model = ChatOpenAI(
    model="gpt-4",
    temperature=0.7,
    max_tokens=1000
)

response = freeplay.invoke(
    prompt_name="content-generator",
    variables={"topic": "sustainable energy"},
    model=model
)

Async Support

All methods in the Freeplay SDK support async/await for better performance in async applications:

Async Agent Invocation

# Async agent creation and invocation
agent = freeplay.create_agent(
    prompt_name="assistant",
    variables={"role": "helpful"},
    tools=[search_knowledge_base]
)

result = await agent.ainvoke({
    "messages": [HumanMessage(content="Help me find information")]
})

Async Simple Invocations

# Async invocation
response = await freeplay.ainvoke(
    prompt_name="sentiment-analyzer",
    variables={"text": "Great product!"}
)

# Async streaming
async for chunk in freeplay.astream(
    prompt_name="content-generator",
    variables={"topic": "machine learning"}
):
    print(chunk.content, end="", flush=True)

Async State Management

# Async state inspection
state = await agent.unwrap().aget_state(config)

# Async state updates
await agent.unwrap().aupdate_state(config, {"approval": "granted"})

Using async methods improves throughput and reduces latency in applications that handle multiple concurrent requests, such as web servers or API endpoints.

Tracking Test Runs and Evaluations

Freeplay helps you systematically evaluate your LangGraph applications by tracking test runs and test cases. This is essential for regression testing and measuring improvements as you iterate.

Test Execution Tracking with Agents

from freeplay_langgraph import FreeplayLangGraph

freeplay = FreeplayLangGraph()

# Track agent invocations as part of a test run
agent = freeplay.create_agent(
    prompt_name="customer-support-agent",
    variables={"company": "Acme Corp"},
    tools=[check_account_status],
    test_run_id="test_run_123",
    test_case_id="test_case_456",
    environment="staging"
)

result = agent.invoke({
    "messages": [HumanMessage(content="Check my account status")]
})

Test Execution Tracking with Simple Invocations

# Track simple invocations for evaluation
response = freeplay.invoke(
    prompt_name="sentiment-classifier",
    variables={"text": "I love this product!"},
    test_run_id="test_run_789",
    test_case_id="test_case_012",
    environment="staging"
)

Automatic Observability

What Gets Tracked

Freeplay automatically captures:

Prompt invocations: Template, variables, and generated content
Model calls: Provider, model name, tokens used, latency
Tool executions: Which tools were called and their results
Agent flows: Multi-step reasoning and decision paths
Conversation flows: Multi-turn interactions and state transitions
Errors and exceptions: Failed invocations with stack traces
Metadata: Test run IDs, test case IDs, environment names, and custom tags

All metadata is injected automatically through LangChain's RunnableBindingBase pattern, ensuring comprehensive observability without manual instrumentation.

Viewing Traces

You can view all of this data in the Freeplay dashboard, making it easy to:

Debug issues and understand failure patterns
Optimize performance and reduce latency
Understand how your application behaves in production
Track token usage and costs across environments
Measure impact of prompt changes over time

Supported LLM Providers

Freeplay's LangGraph SDK supports automatic model instantiation for multiple providers. Install the corresponding LangChain integration package for your provider:

OpenAI

pip install langchain-openai

Anthropic

pip install langchain-anthropic

Vertex AI (Google)

pip install langchain-google-vertexai

The SDK automatically detects which provider your Freeplay prompt is configured to use and instantiates the appropriate model with the correct parameters.