Skip to content

Latency Tracking

Track execution duration for tool calls to identify slow tools and monitor performance.


Overview

Latency tracking measures how long each tool takes to execute. This enables:

  • Performance monitoring - Identify slow tools affecting user experience
  • SLA compliance - Ensure tools meet response time requirements
  • Optimization targets - Focus engineering effort on slow tools
  • Trend analysis - Detect performance degradation over time

Use the @stat.track decorator for automatic latency tracking:

from mcp.server import Server
from mcpstat import MCPStat

app = Server("my-server")
stat = MCPStat("my-server")

@app.call_tool()
@stat.track  # ← Automatic latency tracking!
async def handle_tool(name: str, arguments: dict):
    return await my_logic(arguments)

That's it! The decorator automatically:

  • Measures execution time
  • Records call count
  • Tracks success/failure
  • Never crashes your code

Context Manager Alternative

For more control, use the tracking context manager:

async def handle_tool(name: str, arguments: dict):
    async with stat.tracking(name, "tool"):
        result = await my_logic(arguments)
        return result

Manual Recording (Advanced)

If you need to pass additional data (tokens, response size), use record() directly:

import time

@app.call_tool()
async def handle_tool(name: str, arguments: dict):
    start = time.perf_counter()
    result = await my_logic(arguments)
    duration_ms = int((time.perf_counter() - start) * 1000)

    await stat.record(
        name, "tool",
        duration_ms=duration_ms,
        response_chars=len(str(result))
    )
    return result

Latency Statistics

get_stats() includes comprehensive latency information:

stats = await stat.get_stats()

Response Structure

{
    "latency_summary": {
        "total_duration_ms": 15000,    # Sum across all tools
        "has_latency_data": True       # True if any duration recorded
    },
    "stats": [
        {
            "name": "my_tool",
            "call_count": 10,
            "total_duration_ms": 5000,  # Total time spent
            "min_duration_ms": 100,     # Fastest call
            "max_duration_ms": 1200,    # Slowest call
            "avg_latency_ms": 500,      # Average per call
            ...
        }
    ]
}

Latency Fields

Field Description
total_duration_ms Cumulative execution time in milliseconds
min_duration_ms Fastest recorded execution (NULL if no data)
max_duration_ms Slowest recorded execution (NULL if no data)
avg_latency_ms Average duration per call

Use Cases

Identifying Slow Tools

Find tools with the highest average latency:

stats = await stat.get_stats()

# Sort by average latency
by_latency = sorted(
    [s for s in stats["stats"] if s["avg_latency_ms"] > 0],
    key=lambda s: s["avg_latency_ms"],
    reverse=True
)

print("Slowest tools:")
for tool in by_latency[:5]:
    print(f"  {tool['name']}: {tool['avg_latency_ms']}ms avg")

Performance Alerts

Set up warnings for slow tools:

stats = await stat.get_stats()
SLA_THRESHOLD_MS = 1000  # 1 second

for tool in stats["stats"]:
    if tool["max_duration_ms"] and tool["max_duration_ms"] > SLA_THRESHOLD_MS:
        print(f"⚠️ {tool['name']}: max latency {tool['max_duration_ms']}ms exceeds SLA")

Total Time Analysis

Track total server time spent in tool execution:

stats = await stat.get_stats()
summary = stats["latency_summary"]

print(f"Total execution time: {summary['total_duration_ms'] / 1000:.1f}s")

# Per-tool breakdown
for tool in stats["stats"]:
    if tool["total_duration_ms"] > 0:
        pct = tool["total_duration_ms"] / summary["total_duration_ms"] * 100
        print(f"  {tool['name']}: {pct:.1f}% of total time")

Variance Analysis

Identify tools with inconsistent performance:

for tool in stats["stats"]:
    if tool["min_duration_ms"] and tool["max_duration_ms"]:
        variance = tool["max_duration_ms"] - tool["min_duration_ms"]
        if variance > 500:  # More than 500ms variance
            print(f"⚠️ {tool['name']}: high variance ({tool['min_duration_ms']}-{tool['max_duration_ms']}ms)")

Combined Tracking

Track both tokens and latency for comprehensive analytics:

import time

@app.call_tool()
async def handle_tool(name: str, arguments: dict):
    start = time.perf_counter()
    result = await my_logic(arguments)
    duration_ms = int((time.perf_counter() - start) * 1000)

    await stat.record(
        name, "tool",
        response_chars=len(str(result)),
        duration_ms=duration_ms
    )
    return result

Database Schema

Latency tracking adds these columns to mcpstat_usage:

Column Type Description
total_duration_ms INTEGER Cumulative execution time
min_duration_ms INTEGER Minimum recorded duration (nullable)
max_duration_ms INTEGER Maximum recorded duration (nullable)

Migration

Since v0.2.2

Latency tracking columns were added in version 0.2.2. Existing databases are automatically migrated to include these columns. All existing data is preserved:

  • total_duration_ms defaults to 0
  • min_duration_ms and max_duration_ms default to NULL

Best Practices

1. Measure Full Execution Time

Include all processing time, not just external calls:

start = time.perf_counter()
# Include validation, processing, serialization
result = await full_tool_logic(arguments)
duration_ms = int((time.perf_counter() - start) * 1000)

2. Use perf_counter for Accuracy

time.perf_counter() provides the highest resolution timer:

# Good - high resolution, monotonic
start = time.perf_counter()

# Avoid - lower resolution, can jump
# start = time.time()

3. Set Performance Budgets

Define acceptable latency thresholds per tool type:

BUDGETS = {
    "cache_lookup": 50,    # 50ms for cached data
    "api_call": 2000,      # 2s for external APIs
    "computation": 500,    # 500ms for local processing
}

Track latency over time to detect degradation:

# Log periodic summaries
stats = await stat.get_stats()
for tool in stats["stats"]:
    if tool["call_count"] > 100:  # Enough data
        print(f"{tool['name']}: {tool['avg_latency_ms']}ms avg over {tool['call_count']} calls")