Skip to content

Token Tracking

Track response sizes and estimate token usage for cost analysis.


Overview

MCP servers don't have direct access to LLM token counts - tokens are returned to the client, not the server. mcpstat provides both:

  1. Server-side estimation - Estimate tokens from response character count
  2. Client-side injection - Report actual tokens from LLM API responses

Basic Usage (Estimation)

Track response sizes for automatic token estimation:

@app.call_tool()
async def handle_tool(name: str, arguments: dict):
    result = await my_logic(arguments)

    # Record with response size for token estimation
    await stat.record(
        name, "tool",
        response_chars=len(str(result))
    )
    return result

mcpstat estimates tokens using ~3.5 characters per token (conservative for mixed content).


Actual Token Tracking

If you have access to actual token counts from your LLM provider:

Method 1: Record with Tokens

await stat.record(
    name, "tool",
    input_tokens=100,
    output_tokens=250
)

Method 2: Deferred Reporting

# Record the call first
await stat.record("my_tool", "tool")

# Later, when tokens are available
response = await anthropic.messages.create(...)
await stat.report_tokens(
    "my_tool",
    response.usage.input_tokens,
    response.usage.output_tokens
)

Token Statistics

get_stats() includes comprehensive token information:

stats = await stat.get_stats()

Response Structure

{
    "token_summary": {
        "total_input_tokens": 5000,      # Sum across all tools
        "total_output_tokens": 12000,    # Sum across all tools
        "total_estimated_tokens": 3500,  # From response_chars
        "has_actual_tokens": True        # True if any actual tokens recorded
    },
    "stats": [
        {
            "name": "my_tool",
            "call_count": 10,
            "total_input_tokens": 1000,
            "total_output_tokens": 2500,
            "total_response_chars": 8000,
            "estimated_tokens": 2286,
            "avg_tokens_per_call": 350,   # (input + output) / calls
            ...
        }
    ]
}

Token Fields

Field Description
total_input_tokens Cumulative input tokens (if tracked)
total_output_tokens Cumulative output tokens (if tracked)
total_response_chars Cumulative response characters
estimated_tokens Tokens estimated from response size
avg_tokens_per_call Average tokens per invocation

Estimation vs. Actual

mcpstat prioritizes actual tokens over estimates:

# Priority for avg_tokens_per_call:
if total_input_tokens + total_output_tokens > 0:
    avg = (input + output) / call_count  # Use actual
else:
    avg = estimated_tokens / call_count  # Fall back to estimate

Use Cases

Cost Analysis

Track token usage to estimate API costs:

stats = await stat.get_stats()
summary = stats["token_summary"]

total_tokens = summary["total_input_tokens"] + summary["total_output_tokens"]
estimated_cost = total_tokens * 0.00001  # Example rate
print(f"Estimated cost: ${estimated_cost:.4f}")

Identifying Token-Heavy Tools

Find tools that consume the most tokens:

stats = await stat.get_stats()

# Sort by total tokens
by_tokens = sorted(
    stats["stats"],
    key=lambda s: s["total_input_tokens"] + s["total_output_tokens"],
    reverse=True
)

for tool in by_tokens[:5]:
    total = tool["total_input_tokens"] + tool["total_output_tokens"]
    print(f"{tool['name']}: {total} tokens")

Optimization Recommendations

stats = await stat.get_stats()

for tool in stats["stats"]:
    avg = tool["avg_tokens_per_call"]
    if avg > 1000:
        print(f"⚠️ {tool['name']}: {avg} avg tokens/call - consider optimization")

Database Schema

Token tracking adds these columns to mcpstat_usage:

Column Type Description
total_input_tokens INTEGER Cumulative input tokens
total_output_tokens INTEGER Cumulative output tokens
total_response_chars INTEGER Cumulative response characters
estimated_tokens INTEGER Tokens estimated from response size

Migration

Since v0.2.1

Token tracking columns were added in version 0.2.1. Existing databases are automatically migrated to include these columns. All existing data is preserved, and new columns default to 0.


Best Practices

1. Track Response Sizes

Even without actual tokens, tracking response sizes provides useful estimates:

await stat.record(
    name, "tool",
    response_chars=len(json.dumps(result))
)

2. Use Deferred Reporting for Accuracy

When actual tokens are available, use report_tokens():

# In your client code
response = await client.messages.create(...)
await stat.report_tokens(
    tool_name,
    response.usage.input_tokens,
    response.usage.output_tokens
)

3. Monitor High-Token Tools

Regularly check for tools with high average token usage:

for tool in stats["stats"]:
    if tool["avg_tokens_per_call"] > 500:
        print(f"Review: {tool['name']}")