Token Tracking¶

Track response sizes and estimate token usage for cost analysis.

Overview¶

MCP servers don't have direct access to LLM token counts - tokens are returned to the client, not the server. mcpstat provides both:

Server-side estimation - Estimate tokens from response character count
Client-side injection - Report actual tokens from LLM API responses

Basic Usage (Estimation)¶

Track response sizes for automatic token estimation:

@app.call_tool()
async def handle_tool(name: str, arguments: dict):
    result = await my_logic(arguments)

    # Record with response size for token estimation
    await stat.record(
        name, "tool",
        response_chars=len(str(result))
    )
    return result

mcpstat estimates tokens using ~3.5 characters per token (conservative for mixed content).

Actual Token Tracking¶

If you have access to actual token counts from your LLM provider:

Method 1: Record with Tokens¶

await stat.record(
    name, "tool",
    input_tokens=100,
    output_tokens=250
)

Method 2: Deferred Reporting¶

# Record the call first
await stat.record("my_tool", "tool")

# Later, when tokens are available
response = await anthropic.messages.create(...)
await stat.report_tokens(
    "my_tool",
    response.usage.input_tokens,
    response.usage.output_tokens
)

Token Statistics¶

get_stats() includes comprehensive token information:

stats = await stat.get_stats()

Response Structure¶

{
    "token_summary": {
        "total_input_tokens": 5000,      # Sum across all tools
        "total_output_tokens": 12000,    # Sum across all tools
        "total_estimated_tokens": 3500,  # From response_chars
        "has_actual_tokens": True        # True if any actual tokens recorded
    },
    "stats": [
        {
            "name": "my_tool",
            "call_count": 10,
            "total_input_tokens": 1000,
            "total_output_tokens": 2500,
            "total_response_chars": 8000,
            "estimated_tokens": 2286,
            "avg_tokens_per_call": 350,   # (input + output) / calls
            ...
        }
    ]
}

Token Fields¶

Field	Description
`total_input_tokens`	Cumulative input tokens (if tracked)
`total_output_tokens`	Cumulative output tokens (if tracked)
`total_response_chars`	Cumulative response characters
`estimated_tokens`	Tokens estimated from response size
`avg_tokens_per_call`	Average tokens per invocation

Estimation vs. Actual¶

mcpstat prioritizes actual tokens over estimates:

# Priority for avg_tokens_per_call:
if total_input_tokens + total_output_tokens > 0:
    avg = (input + output) / call_count  # Use actual
else:
    avg = estimated_tokens / call_count  # Fall back to estimate

Use Cases¶

Cost Analysis¶

Track token usage to estimate API costs:

stats = await stat.get_stats()
summary = stats["token_summary"]

total_tokens = summary["total_input_tokens"] + summary["total_output_tokens"]
estimated_cost = total_tokens * 0.00001  # Example rate
print(f"Estimated cost: ${estimated_cost:.4f}")

Identifying Token-Heavy Tools¶

Find tools that consume the most tokens:

stats = await stat.get_stats()

# Sort by total tokens
by_tokens = sorted(
    stats["stats"],
    key=lambda s: s["total_input_tokens"] + s["total_output_tokens"],
    reverse=True
)

for tool in by_tokens[:5]:
    total = tool["total_input_tokens"] + tool["total_output_tokens"]
    print(f"{tool['name']}: {total} tokens")

Optimization Recommendations¶

stats = await stat.get_stats()

for tool in stats["stats"]:
    avg = tool["avg_tokens_per_call"]
    if avg > 1000:
        print(f"⚠️ {tool['name']}: {avg} avg tokens/call - consider optimization")

Database Schema¶

Token tracking adds these columns to mcpstat_usage:

Column	Type	Description
`total_input_tokens`	INTEGER	Cumulative input tokens
`total_output_tokens`	INTEGER	Cumulative output tokens
`total_response_chars`	INTEGER	Cumulative response characters
`estimated_tokens`	INTEGER	Tokens estimated from response size

Migration¶

Since v0.2.1

Token tracking columns were added in version 0.2.1. Existing databases are automatically migrated to include these columns. All existing data is preserved, and new columns default to 0.

Best Practices¶

1. Track Response Sizes¶

Even without actual tokens, tracking response sizes provides useful estimates:

await stat.record(
    name, "tool",
    response_chars=len(json.dumps(result))
)

2. Use Deferred Reporting for Accuracy¶

When actual tokens are available, use report_tokens():

# In your client code
response = await client.messages.create(...)
await stat.report_tokens(
    tool_name,
    response.usage.input_tokens,
    response.usage.output_tokens
)

3. Monitor High-Token Tools¶

Regularly check for tools with high average token usage:

for tool in stats["stats"]:
    if tool["avg_tokens_per_call"] > 500:
        print(f"Review: {tool['name']}")