Overview

Integrate Freeplay with LangGraph to add observability, prompt management, and evaluation capabilities to your LangGraph applications. This comprehensive guide covers everything from basic setup to advanced agent workflows with state management, streaming, and human-in-the-loop patterns.

Prerequisites

Before you begin, make sure you have:

A Freeplay account with an active project
Python 3.10 or higher installed
Basic familiarity with LangGraph and LangChain

Quick Start with Observability

Installation

Install the Freeplay LangGraph SDK along with your preferred LLM provider. For advanced use, please refer to the documentation on PyPi.

# Install Freeplay SDK
pip install freeplay-langgraph

# Install your LLM provider (choose one or more)

pip install langchain-openai
pip install langchain-anthropic
pip install langchain-google-vertexai

Configuration

Set Up Your Credentials

Configure your Freeplay credentials using environment variables:

export FREEPLAY_API_URL="https://app.freeplay.ai/api"
export FREEPLAY_API_KEY="fp-..."
export FREEPLAY_PROJECT_ID="..."

You can find your API key and Project ID in your Freeplay project settings.

Initialize the SDK

Create a FreeplayLangGraph instance in your application:

from freeplay_langgraph import FreeplayLangGraph

# Using environment variables

freeplay = FreeplayLangGraph()

# Or pass credentials directly

freeplay = FreeplayLangGraph(
freeplay*api_url="https://app.freeplay.ai/api",
freeplay_api_key="fp*...",
project*id="proj*...",
)

With this setup, your LangGraph application is now automatically instrumented with OpenTelemetry, sending traces and spans to Freeplay for observability.

Note: It is recommended to manage your prompts within Freeplay to support better prompt development lifecycle. Continue following this guide to get your prompts configured within LangGraph.

Prompt Management

Freeplay’s integration requires that you have your prompts configured in Freeplay. By default, FreeplayLangGraph fetches prompts from the Freeplay API. This requires you to have prompts configured in Freeplay for use. To learn more, see our Prompt Management guide here. Once configured you will need the prompt names for use in the code. Managing prompts in Freeplay separates your prompt engineering workflow from your LangGraph application. Instead of hardcoding prompts in your agent code, your team can iterate on prompt templates, test different versions , new models and deploy changes through Freeplay without modifying or redeploying your LangGraph application. This enables your team to test agent behavior, maintain different prompt versions across environments (development, staging, production), and experiment with variations.

Optional - Prompt Bundling

Once your prompts are saved in Freeplay, you can use bundled prompts stored locally with your application, you can provide a custom template resolver:

from pathlib import Path
from freeplay.resources.prompts import FilesystemTemplateResolver
from freeplay_langgraph import FreeplayLangGraph

# Use filesystem-based prompts bundled with your app
freeplay = FreeplayLangGraph(
    template_resolver=FilesystemTemplateResolver(Path("bundled_prompts"))
)

This is useful for offline environments, testing, or when you want to version control your prompts alongside your code. See our Prompt Bundling Guide to learn more.

Core Concepts

Freeplay provides two primary ways to work with LangGraph:

create_agent() - For building full LangGraph agents with tool calling, ReAct loops, and state management
invoke() - For simple, stateless LLM invocations when you don’t need agent capabilities

Both methods support the same core features: conversation history, tool calling, structured outputs and running tests. Choose create_agent() when you need the full power of LangGraph’s agent framework, and invoke() for simpler use cases.

Building LangGraph Agents

The create_agent method provides full support for LangGraph’s agent capabilities including the ReAct loop, tool calling, state management, middleware, and streaming.

Basic Agent Creation

Create an agent that uses a Freeplay-hosted prompt with automatic model instantiation. You have the ability to pass variables at the creation and invocation of the agent, both are optional depending on your flow:

from freeplay_langgraph import FreeplayLangGraph
from langchain_core.messages import HumanMessage

freeplay = FreeplayLangGraph()

# Create a basic agent with a prmopt stored in Freeplay

agent = freeplay.create_agent(
prompt_name="weather-assistant",
variables={"location": "San Francisco"}, # Optional, enables datasets & testing
environment="production"
)

# Invoke the agent

result = agent.invoke({
"messages": [HumanMessage(content="What's the weather like today?")],
"variables": {{"location": "Denver"}
})

print(result["messages"][-1].content)

Using create_agent gives you access to LangGraph’s full agent capabilities, including tool calling with the ReAct loop, state persistence, and advanced execution control.

Adding Tools

Bind LangChain tools to your agent for agentic workflows. The agent automatically decides when to call tools:

from langchain_core.tools import tool

@tool
def get_weather(city: str) -> str:
    """Get the current weather for a city."""
    return f"Weather in {city}: Sunny, 72°F"

@tool
def get_forecast(city: str, days: int) -> str:
    """Get the weather forecast for a city."""
    return f"{days}-day forecast for {city}: Mostly sunny"

agent = freeplay.create_agent(
    prompt_name="weather-assistant",
    variables={"location": "San Francisco"},
    tools=[get_weather, get_forecast],
    environment="production"
)

result = agent.invoke({
    "messages": [HumanMessage(content="What's the weather in SF and the 5-day forecast?")]
})

The agent handles the tool-calling cycle through LangGraph’s ReAct loop, deciding when to use tools and when to respond directly to the user.

Conversation History

Maintain conversation context across multiple turns with conversation history:

from langchain_core.messages import HumanMessage, AIMessage

# Build conversation history

history = [
HumanMessage(content="What's the weather in Paris?"),
AIMessage(content="It's sunny and 22°C in Paris."),
HumanMessage(content="What about in winter?")
]

agent = freeplay.create_agent(
prompt_name="weather-assistant",
variables={"city": "Paris"},
tools=[get_weather],
environment="production"
)

# Pass history in the messages

result = agent.invoke({
"messages": history + [HumanMessage(content="And the average rainfall?")]
})

For persistent conversations across multiple invocations, use state persistence with checkpointers (covered in State Management section).

Structured Output

Get structured, typed responses from your agents using ToolStrategy or ProviderStrategy:

from pydantic import BaseModel
from langchain.agents.structured_output import ToolStrategy

class WeatherReport(BaseModel):
    city: str
    temperature: float
    conditions: str
    humidity: int

agent = freeplay.create_agent(
    prompt_name="weather-assistant",
    variables={"format": "detailed"},
    tools=[get_weather_data],
    response_format=ToolStrategy(WeatherReport)
)

result = agent.invoke({
    "messages": [HumanMessage(content="Get weather for New York City")]
})

# Access strongly-typed structured output
weather_report = result["structured_response"]
print(f"{weather_report.city}: {weather_report.temperature}°F")
print(f"Conditions: {weather_report.conditions}, Humidity: {weather_report.humidity}%")

Structured output ensures your agent returns data in a predictable format, making it easier to integrate with downstream systems, databases, or UIs.

from typing import cast
from langgraph.graph.state import CompiledStateGraph

agent = freeplay.create_agent(...)

# Option 1: Direct unwrap (works at runtime)

state = agent.unwrap().get_state(config)

# Option 2: Cast for full type hints

compiled = cast(CompiledStateGraph, agent.unwrap())
state = compiled.get_state(config) # ✅ Full IDE autocomplete

Automatic Observability

Once initialized, the Freeplay SDK automatically instruments your LangGraph application with OpenTelemetry. This means every LangChain and LangGraph operation is traced and sent to Freeplay without any additional code.

What Gets Tracked

Freeplay automatically captures:

Prompt invocations: Template, variables, and generated content
Model calls: Provider, model name, tokens used, latency
Tool executions: Which tools were called and their results
Agent flows: Multi-step reasoning and decision paths
Conversation flows: Multi-turn interactions and state transitions
Errors and exceptions: Failed invocations with stack traces
Metadata: Test run IDs, test case IDs, environment names, and custom tags

All metadata is injected automatically through LangChain’s RunnableBindingBase pattern, ensuring comprehensive observability without manual instrumentation.

Viewing Traces

You can view all of this data in the Freeplay dashboard, making it easy to:

Debug issues and understand failure patterns
Optimize performance and reduce latency
Understand how your application behaves in production
Track token usage and costs across environments
Measure impact of prompt changes over time

Simple Prompt Invocations

For simpler use cases that don’t require the full agent loop, use the invoke method. This is ideal for one-off completions, quick classifications, or any scenario where you don’t need agent state management or the ReAct loop.

Basic Invocation

Call a Freeplay-hosted prompt with automatic model instantiation:

from freeplay_langgraph import FreeplayLangGraph

freeplay = FreeplayLangGraph()

# Invoke a prompt - model is automatically created based on Freeplay's config
response = freeplay.invoke(
    prompt_name="sentiment-analyzer",
    variables={"text": "This product exceeded my expectations!"},
    environment="production"
)

print(response.content)

Using invoke gives you quick access to Freeplay-managed prompts without the overhead of agent state or tool calling. This is perfect for classification tasks, content generation, or any stateless LLM operation.

Adding Tools

Bind LangChain tools for basic tool calling without the full agent loop:

from langchain_core.tools import tool

@tool
def calculate_discount(price: float, discount_percent: float) -> float:
"""Calculate the final price after applying a discount."""
return price \* (1 - discount_percent / 100)

@tool
def check_inventory(product_id: str) -> int:
"""Check inventory levels for a product."""
return 42 # Mock inventory count

response = freeplay.invoke(
prompt_name="pricing-assistant",
variables={"product": "laptop", "base_price": 1200},
tools=[calculate_discount, check_inventory]
)

Conversation History

Maintain conversation context across multiple turns:

from langchain_core.messages import HumanMessage, AIMessage

# Build conversation history
history = [
    HumanMessage(content="What's the weather in Paris?"),
    AIMessage(content="It's sunny and 22°C in Paris."),
    HumanMessage(content="What about in winter?")
]

# The prompt has full context of the conversation
response = freeplay.invoke(
    prompt_name="weather-assistant",
    variables={"city": "Paris"},
    history=history
)

print(response.content)

By passing conversation history, your prompts can maintain context across multiple turns without needing full agent state management.

Test Execution Tracking

Track test runs for evaluation workflows by pulling test cases from Freeplay and executing them with automatic tracking. By associating invocations with test runs and test cases, you can analyze performance across your test suite, identify regressions, and measure the impact of prompt changes in Freeplay’s evaluation dashboard. See more about running end to end test runs here.

Creating Test Runs

import os
from freeplay_langgraph import FreeplayLangGraph
from langchain_core.messages import HumanMessage

freeplay = FreeplayLangGraph()

# Create a test run from a dataset

test_run = freeplay.client.test_runs.create(
project_id=os.getenv("FREEPLAY_PROJECT_ID"),
testlist="name of the dataset",
name="name your test run",
)

print(f"Created test run: {test_run.id}")

Executing Test Cases with Simple Invocations

For simple prompt invocations, use the test tracking parameters directly:

# Execute each test case
for test_case in test_run.test_cases:
    response = freeplay.invoke(
        prompt_name="my-prompt",
        variables=test_case.variables,
        test_run_id=test_run.id,
        test_case_id=test_case.id
    )

    print(f"Test case {test_case.id}: {response.content}")

Executing Test Cases with Agents

For LangGraph agents, pass test tracking metadata via config to reuse the agent efficiently:

from langchain_core.messages import HumanMessage

# Create agent once (no test tracking at creation)

agent = freeplay.create_agent(
prompt_name="my-prompt",
variables={"input": "prompt input"},
tools=[get_weather],
)

# Execute each test case with metadata override

for test_case in test_run.trace_test_cases:
result = agent.invoke(
{"messages": [HumanMessage(content=test_case.input)]},
config={
"metadata": {
"freeplay.test_run_id": test_run.id,
"freeplay.test_case_id": test_case.id
}
}
)

    print(f"Test case {test_case.id}: {result['messages'][-1].content}")

Using Custom Models

Provide your own pre-configured LangChain model for more control:

from langchain_openai import ChatOpenAI

# Configure your own model with custom parameters
model = ChatOpenAI(
    model="gpt-4",
    temperature=0.7,
    max_tokens=1000
)

response = freeplay.invoke(
    prompt_name="content-generator",
    variables={"topic": "sustainable energy"},
    model=model
)

Async Support

All methods in the Freeplay SDK support async/await for better performance in async applications:

Async Agent Invocation

# Async agent creation and invocation
agent = freeplay.create_agent(
    prompt_name="assistant",
    variables={"role": "helpful"},
    tools=[search_knowledge_base]
)

result = await agent.ainvoke({
"messages": [HumanMessage(content="Help me find information")]
})

Async Simple Invocations

# Async invocation
response = await freeplay.ainvoke(
    prompt_name="sentiment-analyzer",
    variables={"text": "Great product!"}
)

# Async streaming
async for chunk in freeplay.astream(
    prompt_name="content-generator",
    variables={"topic": "machine learning"}
):
    print(chunk.content, end="", flush=True)

Async State Management

# Async state inspection
state = await agent.unwrap().aget_state(config)

# Async state updates

await agent.unwrap().aupdate_state(config, {"approval": "granted"})

Using async methods improves throughput and reduces latency in applications that handle multiple concurrent requests, such as web servers or API endpoints.

Automatic Observability

What Gets Tracked

Freeplay automatically captures:

Prompt invocations: Template, variables, and generated content
Model calls: Provider, model name, tokens used, latency
Tool executions: Which tools were called and their results
Agent flows: Multi-step reasoning and decision paths
Conversation flows: Multi-turn interactions and state transitions
Errors and exceptions: Failed invocations with stack traces
Metadata: Test run IDs, test case IDs, environment names, and custom tags

All metadata is injected automatically through LangChain’s RunnableBindingBase pattern, ensuring comprehensive observability without manual instrumentation.

Viewing Traces

You can view all of this data in the Freeplay dashboard, making it easy to:

Debug issues and understand failure patterns
Optimize performance and reduce latency
Understand how your application behaves in production
Track token usage and costs across environments
Measure impact of prompt changes over time

Supported LLM Providers

Freeplay’s LangGraph SDK supports automatic model instantiation for multiple providers. Install the corresponding LangChain integration package for your provider:

OpenAI

pip install langchain-openai

Anthropic

pip install langchain-anthropic

Vertex AI (Google)

pip install langchain-google-vertexai

The SDK automatically detects which provider your Freeplay prompt is configured to use and instantiates the appropriate model with the correct parameters.

Getting started

Quick start

Core Concepts

AI Framework integrations

Practical Guides

Recipes

Security & Compliance

Resources

​Overview

​Prerequisites

​Quick Start with Observability

​Installation

​Configuration

​Set Up Your Credentials

​Initialize the SDK

​Prompt Management

​Optional - Prompt Bundling

​Core Concepts

​Building LangGraph Agents

​Basic Agent Creation

​Adding Tools

​Conversation History

​Structured Output

​Automatic Observability

​What Gets Tracked

​Viewing Traces

​Simple Prompt Invocations

​Basic Invocation

​Adding Tools

​Conversation History

​Test Execution Tracking

​Creating Test Runs

​Executing Test Cases with Simple Invocations

​Executing Test Cases with Agents

​Using Custom Models

​Async Support

​Async Agent Invocation

​Async Simple Invocations

​Async State Management

​Automatic Observability

​What Gets Tracked

​Viewing Traces

​Supported LLM Providers

​OpenAI

​Anthropic

​Vertex AI (Google)

Overview

Prerequisites

Quick Start with Observability

Installation

Configuration

Set Up Your Credentials

Initialize the SDK

Prompt Management

Optional - Prompt Bundling

Core Concepts

Building LangGraph Agents

Basic Agent Creation

Adding Tools

Conversation History

Structured Output

Automatic Observability

What Gets Tracked

Viewing Traces

Simple Prompt Invocations

Basic Invocation

Adding Tools

Conversation History

Test Execution Tracking

Creating Test Runs

Executing Test Cases with Simple Invocations

Executing Test Cases with Agents

Using Custom Models

Async Support

Async Agent Invocation

Async Simple Invocations

Async State Management

Automatic Observability

What Gets Tracked

Viewing Traces

Supported LLM Providers

OpenAI

Anthropic

Vertex AI (Google)