Overview

Token management is crucial for optimizing performance and controlling costs when using AI models in NikCLI. This guide covers strategies for efficient token usage, cost optimization, and monitoring across different AI providers.

Usage Optimization

Optimize token usage with smart context management and caching

Cost Control

Monitor and control costs across multiple AI providers

Performance Tuning

Balance quality and efficiency for optimal performance

Analytics & Monitoring

Track usage patterns and identify optimization opportunities

Token Fundamentals

Understanding Token Costs

ProviderModelInput (per 1M tokens)Output (per 1M tokens)Context Window
AnthropicClaude 3.5 Sonnet$3.00$15.00200K
AnthropicClaude 3 Haiku$0.25$1.25200K
OpenAIGPT-4 Turbo$10.00$30.00128K
OpenAIGPT-3.5 Turbo$0.50$1.5016K
GoogleGemini Pro$0.50$1.5032K
Cost Optimization Strategy:
  • Use Claude 3 Haiku for simple tasks (85% cost reduction)
  • Use Claude 3.5 Sonnet for complex reasoning
  • Leverage larger context windows to reduce round trips

Token Optimization Configuration

Context Optimization Strategies

Smart Context Management

# Enable intelligent context pruning
/context optimize --strategy relevance-based
/context prune --keep-essential --remove-redundant

# Analyze current context efficiency  
/context analyze --show-relevance --show-tokens

# Output example:
Context Analysis:
┌──────────────────────┬─────────┬────────────┬─────────────┐
 File Tokens Relevance Keep/Remove
├──────────────────────┼─────────┼────────────┼─────────────┤
 src/App.tsx 450 0.95 Keep
 src/utils/helpers.ts 320 0.85 Keep
 package.json 180 0.40 Remove
 README.md 280 0.30 Remove
└──────────────────────┴─────────┴────────────┴─────────────┘

Optimization Result: 810 tokens saved (35% reduction)

Context Caching

Cost Monitoring and Budgeting

Usage Tracking

# Enable real-time cost monitoring
/config set monitoring.cost-tracking.enabled true
/config set monitoring.cost-tracking.real-time true
/config set monitoring.cost-tracking.alerts true

# Set up cost alerts
/alerts create cost-warning --threshold 10.00 --period daily
/alerts create cost-limit --threshold 50.00 --period monthly --action pause

# Monitor current usage
/stats cost --live --breakdown-by-model

# Real-time output:
Current Session Costs:
- Claude 3.5 Sonnet: $2.45 (18 requests)
- Claude 3 Haiku: $0.12 (45 requests)
- Total Session: $2.57
- Daily Total: $8.43 / $10.00 (84% of budget)
- Monthly Total: $127.85 / $200.00 (64% of budget)

Cost Analysis and Reporting

Advanced Token Optimization

Response Optimization

# Control response length and format
/config set ai.response.max-tokens 2000
/config set ai.response.prefer-concise true
/config set ai.response.avoid-repetition true

# Configure output formatting
/config set ai.response.format-code true
/config set ai.response.include-explanations selective
/config set ai.response.verbosity medium

# Task-specific optimizations
/agent react-expert "create component" --concise --code-only
/agent universal-agent "explain concept" --detailed --educational

Batch Processing

Provider-Specific Optimizations

Anthropic Claude Optimization

# Optimize for Claude's strengths
/config set anthropic.use-xml-formatting true
/config set anthropic.structured-output true
/config set anthropic.thinking-blocks false  # Reduce token usage

# Claude cost optimization
/config set anthropic.prefer-haiku-for-simple true
/config set anthropic.sonnet-threshold-complexity 0.7
/config set anthropic.context-optimization claude-specific

OpenAI GPT Optimization

Token Analytics and Insights

Advanced Analytics

# Analyze usage patterns for optimization
/analytics tokens --patterns --time-range 90d

# Pattern Analysis Results:
Usage Patterns Identified:

1. Peak Usage: 2-4 PM EST (45% of daily usage)
   - Mostly code generation and review tasks
   - Recommendation: Pre-cache common contexts

2. Recurring Tasks: Component creation (35% of requests)
   - Similar patterns with slight variations
   - Recommendation: Create template-based optimization

3. Context Inefficiency: Large context windows with low relevance
   - Average relevance: 0.42 (target: 0.70+)
   - Recommendation: Enable smart context pruning

4. Model Misallocation: Complex models for simple tasks
   - 23% of Haiku-suitable tasks using Sonnet
   - Potential savings: $28/month

ROI Analysis

Best Practices

Token Efficiency Guidelines

Context Optimization

Keep context focused and relevant to the task
# Good: Focused context
/context set "src/components/auth/" --include "*.tsx,*.ts"

# Avoid: Over-broad context  
/context set "." --recursive --all-files

Model Selection

Choose the right model for each task complexity
# Simple tasks → Haiku
/model claude-3-haiku
"Fix this typo in the comment"

# Complex reasoning → Sonnet
/model claude-3-5-sonnet  
"Design architecture for microservices system"

Caching Strategy

Implement intelligent caching for repeated patterns
# Enable context and response caching
/config set cache.context.enabled true
/config set cache.responses.enabled true
/config set cache.similarity-threshold 0.85

Batch Processing

Group similar tasks for efficient processing
# Batch similar component requests
/agent react-expert "create button, input, and select components with consistent styling"

Monitoring Best Practices

# Set up comprehensive monitoring
/monitor enable --cost-tracking --usage-patterns --optimization-opportunities

# Configure alerts for various thresholds
/alerts create --type cost-spike --threshold 200% --period 1h
/alerts create --type unusual-usage --threshold 3-sigma
/alerts create --type optimization-opportunity --min-savings 10.00

Next Steps

Start with basic optimizations like context pruning and smart model selection, then gradually implement advanced features like caching and batch processing. Monitor your token usage regularly to identify optimization opportunities.