Skip to main content

Overview

Token management is crucial for optimizing performance and controlling costs when using AI models in NikCLI. This guide covers strategies for efficient token usage, cost optimization, and monitoring across different AI providers.

Usage Optimization

Optimize token usage with smart context management and caching

Cost Control

Monitor and control costs across multiple AI providers

Performance Tuning

Balance quality and efficiency for optimal performance

Analytics & Monitoring

Track usage patterns and identify optimization opportunities

Token Fundamentals

Understanding Token Costs

  • Provider Comparison
  • Token Usage Patterns
  • Token Estimation
ProviderModelInput (per 1M tokens)Output (per 1M tokens)Context Window
AnthropicClaude 3.5 Sonnet$3.00$15.00200K
AnthropicClaude 3 Haiku$0.25$1.25200K
OpenAIGPT-4 Turbo$10.00$30.00128K
OpenAIGPT-3.5 Turbo$0.50$1.5016K
GoogleGemini Pro$0.50$1.5032K
Cost Optimization Strategy:
  • Use Claude 3 Haiku for simple tasks (85% cost reduction)
  • Use Claude 3.5 Sonnet for complex reasoning
  • Leverage larger context windows to reduce round trips

Token Optimization Configuration

# Configure automatic model selection based on task complexity
/config set ai.smart-model-selection true
/config set ai.model-selection-strategy cost-optimized

# Define model selection rules
/config set ai.selection-rules '{
  "simple-tasks": "claude-3-haiku",
  "code-generation": "claude-3-5-sonnet", 
  "complex-reasoning": "claude-3-5-sonnet",
  "quick-questions": "claude-3-haiku"
}'

# Set cost thresholds
/config set ai.cost-threshold.warning 0.10
/config set ai.cost-threshold.max 1.00
# Optimize context window usage
/config set context.max-tokens 8000
/config set context.optimization-strategy smart-pruning
/config set context.relevance-threshold 0.7

# Configure context compression
/config set context.compression.enabled true
/config set context.compression.algorithm semantic
/config set context.compression.ratio 0.3

# Set up context caching
/config set cache.context.enabled true
/config set cache.context.ttl 3600  # 1 hour
/config set cache.similarity-threshold 0.85

Context Optimization Strategies

Smart Context Management

  • Context Pruning
  • Dynamic Context Loading
  • Context Compression
# Enable intelligent context pruning
/context optimize --strategy relevance-based
/context prune --keep-essential --remove-redundant

# Analyze current context efficiency  
/context analyze --show-relevance --show-tokens

# Output example:
Context Analysis:
┌──────────────────────┬─────────┬────────────┬─────────────┐
 File Tokens Relevance Keep/Remove
├──────────────────────┼─────────┼────────────┼─────────────┤
 src/App.tsx 450 0.95 Keep
 src/utils/helpers.ts 320 0.85 Keep
 package.json 180 0.40 Remove
 README.md 280 0.30 Remove
└──────────────────────┴─────────┴────────────┴─────────────┘

Optimization Result: 810 tokens saved (35% reduction)

Context Caching

// Context caching configuration
interface ContextCacheConfig {
  enabled: boolean;
  ttl: number;                    // Time to live in seconds
  maxSize: number;                // Maximum cache entries
  similarityThreshold: number;    // Minimum similarity for cache hit
  compressionEnabled: boolean;    // Compress cached contexts
  persistToDisk: boolean;         // Persist cache across sessions
}

const cacheConfig: ContextCacheConfig = {
  enabled: true,
  ttl: 3600,                     // 1 hour
  maxSize: 1000,                 // 1000 cached contexts
  similarityThreshold: 0.85,     // 85% similarity for reuse
  compressionEnabled: true,
  persistToDisk: true
};
Cache Hit Examples:
# First request - cache miss
/agent react-expert "create login component" 
# Context: 2,500 tokens, Cost: $0.038

# Similar request - cache hit
/agent react-expert "create signup component"
# Context: 150 tokens (cached), Cost: $0.003 (90% savings)
# View cache performance
/cache stats --context --detailed

# Output:
Context Cache Performance (Last 7 Days):
- Cache Hits: 847 (73.2%)
- Cache Misses: 310 (26.8%)
- Average Token Savings: 1,850 per hit
- Total Tokens Saved: 1,567,950
- Cost Savings: $47.04
- Cache Size: 156MB (compressed)

# Cache optimization suggestions
/cache optimize --suggestions --auto-tune

Cost Monitoring and Budgeting

Usage Tracking

  • Real-Time Monitoring
  • Budget Management
  • Cost Optimization Recommendations
# Enable real-time cost monitoring
/config set monitoring.cost-tracking.enabled true
/config set monitoring.cost-tracking.real-time true
/config set monitoring.cost-tracking.alerts true

# Set up cost alerts
/alerts create cost-warning --threshold 10.00 --period daily
/alerts create cost-limit --threshold 50.00 --period monthly --action pause

# Monitor current usage
/stats cost --live --breakdown-by-model

# Real-time output:
Current Session Costs:
- Claude 3.5 Sonnet: $2.45 (18 requests)
- Claude 3 Haiku: $0.12 (45 requests)
- Total Session: $2.57
- Daily Total: $8.43 / $10.00 (84% of budget)
- Monthly Total: $127.85 / $200.00 (64% of budget)

Cost Analysis and Reporting

# Generate comprehensive cost report
/report cost --period monthly --detailed --export pdf

# Report includes:
# - Usage by model and provider
# - Cost trends and projections
# - Top consuming agents and tasks
# - Optimization opportunities
# - Comparative analysis with previous periods

# Custom report queries
/report cost --filter "agent:react-expert" --time-range "last-week"
/report cost --group-by "project" --show-trends
/report cost --compare-periods "this-month:last-month"
# Allocate costs to projects/teams
/cost allocate --project e-commerce --percentage 40
/cost allocate --team frontend --based-on-usage
/cost allocate --department engineering --overhead 0.15

# View allocation breakdown
/cost allocation --summary --export csv

# Output:
Cost Allocation Summary (October 2024):
┌─────────────┬──────────────┬─────────────┬────────────┐
 Project Direct Costs Allocated Total
├─────────────┼──────────────┼─────────────┼────────────┤
 E-commerce $48.50 $12.75 $61.25
 Mobile App $32.20 $8.45 $40.65
 API Gateway $19.80 $5.20 $25.00
 Unallocated $15.30 $4.02 $19.32
└─────────────┴──────────────┴─────────────┴────────────┘

Advanced Token Optimization

Response Optimization

  • Output Control
  • Streaming Optimization
  • Multi-Turn Optimization
# Control response length and format
/config set ai.response.max-tokens 2000
/config set ai.response.prefer-concise true
/config set ai.response.avoid-repetition true

# Configure output formatting
/config set ai.response.format-code true
/config set ai.response.include-explanations selective
/config set ai.response.verbosity medium

# Task-specific optimizations
/agent react-expert "create component" --concise --code-only
/agent universal-agent "explain concept" --detailed --educational

Batch Processing

# Enable intelligent request batching
/config set batching.enabled true
/config set batching.max-batch-size 5
/config set batching.wait-time 2000  # 2 seconds

# Configure batching strategies
/config set batching.strategy similarity-based
/config set batching.similarity-threshold 0.7
/config set batching.max-wait-time 5000

# Example: Batching similar requests
/agent react-expert "create button component"
/agent react-expert "create input component"  
/agent react-expert "create modal component"
# → Batched into single request: "create button, input, and modal components"
# Token savings: ~40% compared to individual requests
# Configure parallel processing for independent tasks
/config set parallel.enabled true
/config set parallel.max-concurrent 3
/config set parallel.load-balancing true

# Parallel execution with shared context
/parallel "backend-agent,react-expert" "build authentication system" 
       --shared-context "user-requirements.md"
       --optimize-tokens

# Token optimization in parallel execution:
# - Shared context loaded once (instead of per agent)
# - Results cached for cross-agent reference
# - Duplicate processing eliminated

Provider-Specific Optimizations

Anthropic Claude Optimization

  • Claude-Specific Features
  • Context Window Usage
# Optimize for Claude's strengths
/config set anthropic.use-xml-formatting true
/config set anthropic.structured-output true
/config set anthropic.thinking-blocks false  # Reduce token usage

# Claude cost optimization
/config set anthropic.prefer-haiku-for-simple true
/config set anthropic.sonnet-threshold-complexity 0.7
/config set anthropic.context-optimization claude-specific

OpenAI GPT Optimization

# Optimize for GPT models
/config set openai.use-function-calling true
/config set openai.structured-output true  
/config set openai.json-mode-when-applicable true

# Model selection optimization
/config set openai.use-35-turbo-for-simple true
/config set openai.gpt4-threshold-complexity 0.8

Token Analytics and Insights

Advanced Analytics

  • Usage Pattern Analysis
  • Predictive Analytics
# Analyze usage patterns for optimization
/analytics tokens --patterns --time-range 90d

# Pattern Analysis Results:
Usage Patterns Identified:

1. Peak Usage: 2-4 PM EST (45% of daily usage)
   - Mostly code generation and review tasks
   - Recommendation: Pre-cache common contexts

2. Recurring Tasks: Component creation (35% of requests)
   - Similar patterns with slight variations
   - Recommendation: Create template-based optimization

3. Context Inefficiency: Large context windows with low relevance
   - Average relevance: 0.42 (target: 0.70+)
   - Recommendation: Enable smart context pruning

4. Model Misallocation: Complex models for simple tasks
   - 23% of Haiku-suitable tasks using Sonnet
   - Potential savings: $28/month

ROI Analysis

# Calculate return on investment
/analytics roi --calculate-value --time-saved --quality-improvement

# ROI Analysis Results:
Development Productivity Analysis (Last Quarter):

Time Savings:
- Code generation: 240 hours saved
- Code review: 120 hours saved  
- Documentation: 80 hours saved
- Debugging assistance: 160 hours saved
- Total: 600 hours saved

Cost Analysis:
- NikCLI token costs: $427.50
- Developer time value: $75/hour average
- Time value saved: $45,000

ROI Calculation:
- Investment: $427.50
- Return: $45,000
- ROI: 10,525% or 105x return

Quality Improvements:
- Code quality score: +23%
- Bug reduction: -34%
- Test coverage: +18%
- Documentation completeness: +45%

Best Practices

Token Efficiency Guidelines

Context Optimization

Keep context focused and relevant to the task
# Good: Focused context
/context set "src/components/auth/" --include "*.tsx,*.ts"

# Avoid: Over-broad context  
/context set "." --recursive --all-files

Model Selection

Choose the right model for each task complexity
# Simple tasks → Haiku
/model claude-3-haiku
"Fix this typo in the comment"

# Complex reasoning → Sonnet
/model claude-3-5-sonnet  
"Design architecture for microservices system"

Caching Strategy

Implement intelligent caching for repeated patterns
# Enable context and response caching
/config set cache.context.enabled true
/config set cache.responses.enabled true
/config set cache.similarity-threshold 0.85

Batch Processing

Group similar tasks for efficient processing
# Batch similar component requests
/agent react-expert "create button, input, and select components with consistent styling"

Monitoring Best Practices

  • Continuous Monitoring
  • Regular Reviews
# Set up comprehensive monitoring
/monitor enable --cost-tracking --usage-patterns --optimization-opportunities

# Configure alerts for various thresholds
/alerts create --type cost-spike --threshold 200% --period 1h
/alerts create --type unusual-usage --threshold 3-sigma
/alerts create --type optimization-opportunity --min-savings 10.00

Next Steps

Start with basic optimizations like context pruning and smart model selection, then gradually implement advanced features like caching and batch processing. Monitor your token usage regularly to identify optimization opportunities.