Token Tracking¶
Track response sizes and estimate token usage for cost analysis.
Overview¶
MCP servers don't have direct access to LLM token counts - tokens are returned to the client, not the server. mcpstat provides both:
- Server-side estimation - Estimate tokens from response character count
- Client-side injection - Report actual tokens from LLM API responses
Basic Usage (Estimation)¶
Track response sizes for automatic token estimation:
@app.call_tool()
async def handle_tool(name: str, arguments: dict):
result = await my_logic(arguments)
# Record with response size for token estimation
await stat.record(
name, "tool",
response_chars=len(str(result))
)
return result
mcpstat estimates tokens using ~3.5 characters per token (conservative for mixed content).
Actual Token Tracking¶
If you have access to actual token counts from your LLM provider:
Method 1: Record with Tokens¶
Method 2: Deferred Reporting¶
# Record the call first
await stat.record("my_tool", "tool")
# Later, when tokens are available
response = await anthropic.messages.create(...)
await stat.report_tokens(
"my_tool",
response.usage.input_tokens,
response.usage.output_tokens
)
Token Statistics¶
get_stats() includes comprehensive token information:
Response Structure¶
{
"token_summary": {
"total_input_tokens": 5000, # Sum across all tools
"total_output_tokens": 12000, # Sum across all tools
"total_estimated_tokens": 3500, # From response_chars
"has_actual_tokens": True # True if any actual tokens recorded
},
"stats": [
{
"name": "my_tool",
"call_count": 10,
"total_input_tokens": 1000,
"total_output_tokens": 2500,
"total_response_chars": 8000,
"estimated_tokens": 2286,
"avg_tokens_per_call": 350, # (input + output) / calls
...
}
]
}
Token Fields¶
| Field | Description |
|---|---|
total_input_tokens | Cumulative input tokens (if tracked) |
total_output_tokens | Cumulative output tokens (if tracked) |
total_response_chars | Cumulative response characters |
estimated_tokens | Tokens estimated from response size |
avg_tokens_per_call | Average tokens per invocation |
Estimation vs. Actual¶
mcpstat prioritizes actual tokens over estimates:
# Priority for avg_tokens_per_call:
if total_input_tokens + total_output_tokens > 0:
avg = (input + output) / call_count # Use actual
else:
avg = estimated_tokens / call_count # Fall back to estimate
Use Cases¶
Cost Analysis¶
Track token usage to estimate API costs:
stats = await stat.get_stats()
summary = stats["token_summary"]
total_tokens = summary["total_input_tokens"] + summary["total_output_tokens"]
estimated_cost = total_tokens * 0.00001 # Example rate
print(f"Estimated cost: ${estimated_cost:.4f}")
Identifying Token-Heavy Tools¶
Find tools that consume the most tokens:
stats = await stat.get_stats()
# Sort by total tokens
by_tokens = sorted(
stats["stats"],
key=lambda s: s["total_input_tokens"] + s["total_output_tokens"],
reverse=True
)
for tool in by_tokens[:5]:
total = tool["total_input_tokens"] + tool["total_output_tokens"]
print(f"{tool['name']}: {total} tokens")
Optimization Recommendations¶
stats = await stat.get_stats()
for tool in stats["stats"]:
avg = tool["avg_tokens_per_call"]
if avg > 1000:
print(f"⚠️ {tool['name']}: {avg} avg tokens/call - consider optimization")
Database Schema¶
Token tracking adds these columns to mcpstat_usage:
| Column | Type | Description |
|---|---|---|
total_input_tokens | INTEGER | Cumulative input tokens |
total_output_tokens | INTEGER | Cumulative output tokens |
total_response_chars | INTEGER | Cumulative response characters |
estimated_tokens | INTEGER | Tokens estimated from response size |
Migration¶
Since v0.2.1
Token tracking columns were added in version 0.2.1. Existing databases are automatically migrated to include these columns. All existing data is preserved, and new columns default to 0.
Best Practices¶
1. Track Response Sizes¶
Even without actual tokens, tracking response sizes provides useful estimates:
2. Use Deferred Reporting for Accuracy¶
When actual tokens are available, use report_tokens():
# In your client code
response = await client.messages.create(...)
await stat.report_tokens(
tool_name,
response.usage.input_tokens,
response.usage.output_tokens
)
3. Monitor High-Token Tools¶
Regularly check for tools with high average token usage: