Models & Pricing¶
Understanding how Routstr calculates costs is essential for managing your API usage efficiently. This guide explains the pricing models and how to configure them.
Pricing Models¶
Routstr supports three pricing models:
1. Fixed Pricing¶
Simple per-request charging:
FIXED_PRICING=true
FIXED_COST_PER_REQUEST=10 # 10 sats per request
Best for:
- Uniform API usage
- Simple applications
- Predictable costs
2. Token-Based Pricing¶
Charge based on actual token usage:
FIXED_PRICING=false # use model pricing
FIXED_COST_PER_REQUEST=1 # optional base fee
FIXED_PER_1K_INPUT_TOKENS=5 # optional override
FIXED_PER_1K_OUTPUT_TOKENS=15 # optional override
Best for:
- Varied request sizes
- Fair usage billing
- Cost optimization
3. Model-Based Pricing¶
Dynamic pricing based on model costs:
FIXED_PRICING=false
EXCHANGE_FEE=1.005 # 0.5% exchange fee
UPSTREAM_PROVIDER_FEE=1.05 # 5% provider fee
Best for:
- Multiple models
- Market-based pricing
- Automatic updates
Model Configuration¶
Default Models¶
Routstr includes pricing for popular models:
Model | Input ($/1K) | Output ($/1K) | Context | Notes |
---|---|---|---|---|
gpt-3.5-turbo | $0.0015 | $0.002 | 16K | Fast, economical |
gpt-4 | $0.03 | $0.06 | 8K | Advanced reasoning |
gpt-4-turbo | $0.01 | $0.03 | 128K | Large context |
claude-3-opus | $0.015 | $0.075 | 200K | Best quality |
claude-3-sonnet | $0.003 | $0.015 | 200K | Balanced |
llama-2-70b | $0.0007 | $0.0009 | 4K | Open source |
Custom Models File¶
Create models.json
to override defaults:
{
"models": [
{
"id": "gpt-4-vision",
"name": "GPT-4 Vision",
"pricing": {
"prompt": "0.00003",
"completion": "0.00006",
"request": "0",
"image": "0.00255"
},
"context_length": 128000,
"supports_vision": true
},
{
"id": "custom-model",
"name": "My Custom Model",
"pricing": {
"prompt": "0.001",
"completion": "0.002",
"request": "0.0001"
},
"context_length": 8192
}
]
}
Auto-updating Models¶
Fetch latest models from OpenRouter:
# Update models from API
python scripts/models_meta.py
# Or manually
curl https://openrouter.ai/api/v1/models > models.json
Cost Calculation¶
Understanding the Formula¶
Base Cost = (Input Tokens × Input Rate) + (Output Tokens × Output Rate) + Request Fee
Bitcoin Price = Current BTC/USD rate (e.g., $50,000)
Sats Cost = (Base Cost / Bitcoin Price) × 100,000,000
Final Cost = Sats Cost × Exchange Fee × Provider Fee
Example Calculations¶
Example 1: Simple Chat (gpt-3.5-turbo)
Input: 50 tokens
Output: 150 tokens
Model rates: $0.0015/1K input, $0.002/1K output
USD Cost = (50/1000 × 0.0015) + (150/1000 × 0.002)
= $0.000075 + $0.0003
= $0.000375
At $50,000/BTC: 0.75 sats
With 5.5% total fees: 0.79 sats
Example 2: Large Context (gpt-4)
Input: 2,000 tokens
Output: 500 tokens
Model rates: $0.03/1K input, $0.06/1K output
USD Cost = (2000/1000 × 0.03) + (500/1000 × 0.06)
= $0.06 + $0.03
= $0.09
At $50,000/BTC: 180 sats
With 5.5% total fees: 190 sats
Example 3: Image Generation (dall-e-3)
Model: dall-e-3
Size: 1024x1024
Quality: standard
Cost: $0.04 per image
At $50,000/BTC: 80 sats
With 5.5% fees: 84 sats
Fee Structure¶
Exchange Fee¶
Covers Bitcoin/USD conversion costs:
EXCHANGE_FEE=1.005 # 0.5% default
Factors:
- Exchange rate volatility
- Conversion costs
- Price update frequency
Provider Fee¶
Node operator's margin:
UPSTREAM_PROVIDER_FEE=1.05 # 5% default
Covers:
- Infrastructure costs
- Maintenance
- Support
- Profit margin
Calculating Total Fees¶
Total Multiplier = EXCHANGE_FEE × UPSTREAM_PROVIDER_FEE
Example: 1.005 × 1.05 = 1.05525 (5.525% total)
Special Pricing¶
Image Models¶
Image generation uses per-image pricing:
Model | Size | Quality | Price |
---|---|---|---|
dall-e-2 | 256x256 | - | $0.016 |
dall-e-2 | 512x512 | - | $0.018 |
dall-e-2 | 1024x1024 | - | $0.02 |
dall-e-3 | 1024x1024 | standard | $0.04 |
dall-e-3 | 1024x1024 | hd | $0.08 |
dall-e-3 | 1024x1792 | standard | $0.08 |
dall-e-3 | 1024x1792 | hd | $0.12 |
Audio Models¶
Audio pricing by duration:
Model | Type | Price |
---|---|---|
whisper-1 | Transcription | $0.006/minute |
whisper-1 | Translation | $0.006/minute |
tts-1 | Text-to-speech | $0.015/1K chars |
tts-1-hd | HD speech | $0.03/1K chars |
Embedding Models¶
Lower costs for embeddings:
Model | Price/1K tokens |
---|---|
text-embedding-3-small | $0.00002 |
text-embedding-3-large | $0.00013 |
text-embedding-ada-002 | $0.0001 |
Monitoring Costs¶
Per-Request Tracking¶
Each API response includes usage data:
{
"usage": {
"prompt_tokens": 50,
"completion_tokens": 150,
"total_tokens": 200
},
"x-routstr-cost": {
"sats": 79,
"usd": 0.000375,
"breakdown": {
"prompt_cost": 15,
"completion_cost": 60,
"fees": 4
}
}
}
Daily Summaries¶
View in admin dashboard:
- Total requests
- Token usage by model
- Cost distribution
- Trending patterns
Cost Alerts¶
Set up notifications:
# Example monitoring script
def check_daily_spend(api_key):
balance_start = get_balance(api_key, "00:00")
balance_now = get_balance(api_key)
spent = balance_start - balance_now
if spent > DAILY_LIMIT:
send_alert(f"Daily spend exceeded: {spent} sats")
Optimization Strategies¶
Model Selection¶
Choose the right model for each task:
Task | Recommended Model | Why |
---|---|---|
Simple Q&A | gpt-3.5-turbo | Fast, cheap, sufficient |
Code generation | gpt-4 | Better reasoning |
Summarization | claude-3-haiku | Good balance |
Creative writing | claude-3-opus | Best quality |
Embeddings | text-embedding-3-small | Optimized for vectors |
Prompt Engineering¶
Reduce costs with efficient prompts:
# Expensive
prompt = """
You are an AI assistant. Your task is to help users.
Please provide detailed, comprehensive answers.
Now, answer this question: What is 2+2?
"""
# Economical
prompt = "Calculate: 2+2"
Caching Strategies¶
Implement smart caching:
# Cache embedding results
@lru_cache(maxsize=1000)
def get_embedding(text):
return client.embeddings.create(
model="text-embedding-3-small",
input=text
)
# Cache common responses
COMMON_RESPONSES = {
"greeting": "Hello! How can I help you?",
"goodbye": "Goodbye! Have a great day!"
}
Batch Processing¶
Process multiple items efficiently:
# Instead of multiple calls
for item in items:
response = client.chat.completions.create(...)
# Use single call with formatted prompt
prompt = "\n".join([f"{i+1}. {item}" for i, item in enumerate(items)])
response = client.chat.completions.create(
messages=[{"role": "user", "content": f"Process these items:\n{prompt}"}]
)
Custom Pricing Rules¶
Time-Based Pricing¶
Implement off-peak discounts:
def calculate_multiplier():
hour = datetime.now().hour
if 2 <= hour <= 6: # 2 AM - 6 AM
return 0.8 # 20% discount
elif 18 <= hour <= 22: # 6 PM - 10 PM
return 1.2 # 20% premium
return 1.0
Model-Specific Rules¶
Custom pricing logic:
def adjust_model_price(model, base_price):
# Premium for latest models
if "turbo" in model or "latest" in model:
return base_price * 1.1
# Discount for older models
if "legacy" in model:
return base_price * 0.8
return base_price
Pricing Transparency¶
Public Pricing Page¶
Display current rates:
<!-- Available at /pricing -->
<table>
<tr>
<th>Model</th>
<th>Input (sats/1K)</th>
<th>Output (sats/1K)</th>
</tr>
<!-- Dynamically generated from models.json -->
</table>
Cost Estimation API¶
Provide cost estimates:
POST /v1/estimate
{
"model": "gpt-4",
"prompt_tokens": 500,
"max_tokens": 200
}
Response:
{
"estimated_cost_sats": 45,
"breakdown": {
"prompt": 30,
"completion": 12,
"fees": 3
}
}
Troubleshooting¶
Pricing Mismatches¶
Issue: Costs don't match expectations
- Check current BTC/USD rate
- Verify fee settings
- Review model configuration
Issue: Models not found
- Update models.json
- Check model ID spelling
- Verify upstream support
Fee Calculations¶
Issue: Fees seem too high
- Review EXCHANGE_FEE setting
- Check UPSTREAM_PROVIDER_FEE
- Calculate total multiplier
Next Steps¶
- API Reference - Technical details
- Custom Pricing - Advanced configuration
- Contributing - Help improve Routstr