Token Management - NikCLI Documentation

Overview

Token management is crucial for optimizing performance and controlling costs when using AI models in NikCLI. This guide covers strategies for efficient token usage, cost optimization, and monitoring across different AI providers.

Usage Optimization

Optimize token usage with smart context management and caching

Cost Control

Monitor and control costs across multiple AI providers

Performance Tuning

Balance quality and efficiency for optimal performance

Analytics & Monitoring

Track usage patterns and identify optimization opportunities

Token Fundamentals

Understanding Token Costs

Provider Comparison
Token Usage Patterns
Token Estimation

Provider	Model	Input (per 1M tokens)	Output (per 1M tokens)	Context Window
Anthropic	Claude 3.5 Sonnet	$3.00	$15.00	200K
Anthropic	Claude 3 Haiku	$0.25	$1.25	200K
OpenAI	GPT-4 Turbo	$10.00	$30.00	128K
OpenAI	GPT-3.5 Turbo	$0.50	$1.50	16K
Google	Gemini Pro	$0.50	$1.50	32K

Cost Optimization Strategy:

Use Claude 3 Haiku for simple tasks (85% cost reduction)
Use Claude 3.5 Sonnet for complex reasoning
Leverage larger context windows to reduce round trips

Token Optimization Configuration

Smart Model Selection

# Configure automatic model selection based on task complexity
/config set ai.smart-model-selection true
/config set ai.model-selection-strategy cost-optimized

# Define model selection rules
/config set ai.selection-rules '{
  "simple-tasks": "claude-3-haiku",
  "code-generation": "claude-3-5-sonnet", 
  "complex-reasoning": "claude-3-5-sonnet",
  "quick-questions": "claude-3-haiku"
}'

# Set cost thresholds
/config set ai.cost-threshold.warning 0.10
/config set ai.cost-threshold.max 1.00

Context Management

# Optimize context window usage
/config set context.max-tokens 8000
/config set context.optimization-strategy smart-pruning
/config set context.relevance-threshold 0.7

# Configure context compression
/config set context.compression.enabled true
/config set context.compression.algorithm semantic
/config set context.compression.ratio 0.3

# Set up context caching
/config set cache.context.enabled true
/config set cache.context.ttl 3600  # 1 hour
/config set cache.similarity-threshold 0.85

Context Optimization Strategies

Smart Context Management

Context Pruning
Dynamic Context Loading
Context Compression

# Enable intelligent context pruning
/context optimize --strategy relevance-based
/context prune --keep-essential --remove-redundant

# Analyze current context efficiency  
/context analyze --show-relevance --show-tokens

# Output example:
Context Analysis:
┌──────────────────────┬─────────┬────────────┬─────────────┐
│ File                 │ Tokens  │ Relevance  │ Keep/Remove │
├──────────────────────┼─────────┼────────────┼─────────────┤
│ src/App.tsx          │ 450     │ 0.95       │ Keep        │
│ src/utils/helpers.ts │ 320     │ 0.85       │ Keep        │
│ package.json         │ 180     │ 0.40       │ Remove      │
│ README.md            │ 280     │ 0.30       │ Remove      │
└──────────────────────┴─────────┴────────────┴─────────────┘

Optimization Result: 810 tokens saved (35% reduction)

Context Caching

Intelligent Caching

// Context caching configuration
interface ContextCacheConfig {
  enabled: boolean;
  ttl: number;                    // Time to live in seconds
  maxSize: number;                // Maximum cache entries
  similarityThreshold: number;    // Minimum similarity for cache hit
  compressionEnabled: boolean;    // Compress cached contexts
  persistToDisk: boolean;         // Persist cache across sessions
}

const cacheConfig: ContextCacheConfig = {
  enabled: true,
  ttl: 3600,                     // 1 hour
  maxSize: 1000,                 // 1000 cached contexts
  similarityThreshold: 0.85,     // 85% similarity for reuse
  compressionEnabled: true,
  persistToDisk: true
};

Cache Hit Examples:

# First request - cache miss
/agent react-expert "create login component" 
# Context: 2,500 tokens, Cost: $0.038

# Similar request - cache hit
/agent react-expert "create signup component"
# Context: 150 tokens (cached), Cost: $0.003 (90% savings)

Cache Analytics

# View cache performance
/cache stats --context --detailed

# Output:
Context Cache Performance (Last 7 Days):
- Cache Hits: 847 (73.2%)
- Cache Misses: 310 (26.8%)
- Average Token Savings: 1,850 per hit
- Total Tokens Saved: 1,567,950
- Cost Savings: $47.04
- Cache Size: 156MB (compressed)

# Cache optimization suggestions
/cache optimize --suggestions --auto-tune

Cost Monitoring and Budgeting

Usage Tracking

Real-Time Monitoring
Budget Management
Cost Optimization Recommendations

# Enable real-time cost monitoring
/config set monitoring.cost-tracking.enabled true
/config set monitoring.cost-tracking.real-time true
/config set monitoring.cost-tracking.alerts true

# Set up cost alerts
/alerts create cost-warning --threshold 10.00 --period daily
/alerts create cost-limit --threshold 50.00 --period monthly --action pause

# Monitor current usage
/stats cost --live --breakdown-by-model

# Real-time output:
Current Session Costs:
- Claude 3.5 Sonnet: $2.45 (18 requests)
- Claude 3 Haiku: $0.12 (45 requests)
- Total Session: $2.57
- Daily Total: $8.43 / $10.00 (84% of budget)
- Monthly Total: $127.85 / $200.00 (64% of budget)

Cost Analysis and Reporting

Detailed Cost Reports

# Generate comprehensive cost report
/report cost --period monthly --detailed --export pdf

# Report includes:
# - Usage by model and provider
# - Cost trends and projections
# - Top consuming agents and tasks
# - Optimization opportunities
# - Comparative analysis with previous periods

# Custom report queries
/report cost --filter "agent:react-expert" --time-range "last-week"
/report cost --group-by "project" --show-trends
/report cost --compare-periods "this-month:last-month"

Cost Allocation

# Allocate costs to projects/teams
/cost allocate --project e-commerce --percentage 40
/cost allocate --team frontend --based-on-usage
/cost allocate --department engineering --overhead 0.15

# View allocation breakdown
/cost allocation --summary --export csv

# Output:
Cost Allocation Summary (October 2024):
┌─────────────┬──────────────┬─────────────┬────────────┐
│ Project     │ Direct Costs │ Allocated   │ Total      │
├─────────────┼──────────────┼─────────────┼────────────┤
│ E-commerce  │ $48.50       │ $12.75      │ $61.25     │
│ Mobile App  │ $32.20       │ $8.45       │ $40.65     │
│ API Gateway │ $19.80       │ $5.20       │ $25.00     │
│ Unallocated │ $15.30       │ $4.02       │ $19.32     │
└─────────────┴──────────────┴─────────────┴────────────┘

Advanced Token Optimization

Response Optimization

Output Control
Streaming Optimization
Multi-Turn Optimization

# Control response length and format
/config set ai.response.max-tokens 2000
/config set ai.response.prefer-concise true
/config set ai.response.avoid-repetition true

# Configure output formatting
/config set ai.response.format-code true
/config set ai.response.include-explanations selective
/config set ai.response.verbosity medium

# Task-specific optimizations
/agent react-expert "create component" --concise --code-only
/agent universal-agent "explain concept" --detailed --educational

Batch Processing

Request Batching

# Enable intelligent request batching
/config set batching.enabled true
/config set batching.max-batch-size 5
/config set batching.wait-time 2000  # 2 seconds

# Configure batching strategies
/config set batching.strategy similarity-based
/config set batching.similarity-threshold 0.7
/config set batching.max-wait-time 5000

# Example: Batching similar requests
/agent react-expert "create button component"
/agent react-expert "create input component"  
/agent react-expert "create modal component"
# → Batched into single request: "create button, input, and modal components"
# Token savings: ~40% compared to individual requests

Parallel Processing

# Configure parallel processing for independent tasks
/config set parallel.enabled true
/config set parallel.max-concurrent 3
/config set parallel.load-balancing true

# Parallel execution with shared context
/parallel "backend-agent,react-expert" "build authentication system" 
       --shared-context "user-requirements.md"
       --optimize-tokens

# Token optimization in parallel execution:
# - Shared context loaded once (instead of per agent)
# - Results cached for cross-agent reference
# - Duplicate processing eliminated

Provider-Specific Optimizations

Anthropic Claude Optimization

Claude-Specific Features
Context Window Usage

# Optimize for Claude's strengths
/config set anthropic.use-xml-formatting true
/config set anthropic.structured-output true
/config set anthropic.thinking-blocks false  # Reduce token usage

# Claude cost optimization
/config set anthropic.prefer-haiku-for-simple true
/config set anthropic.sonnet-threshold-complexity 0.7
/config set anthropic.context-optimization claude-specific

OpenAI GPT Optimization

GPT-Specific Settings

# Optimize for GPT models
/config set openai.use-function-calling true
/config set openai.structured-output true  
/config set openai.json-mode-when-applicable true

# Model selection optimization
/config set openai.use-35-turbo-for-simple true
/config set openai.gpt4-threshold-complexity 0.8

Token Analytics and Insights

Advanced Analytics

Usage Pattern Analysis
Predictive Analytics

# Analyze usage patterns for optimization
/analytics tokens --patterns --time-range 90d

# Pattern Analysis Results:
Usage Patterns Identified:

1. Peak Usage: 2-4 PM EST (45% of daily usage)
   - Mostly code generation and review tasks
   - Recommendation: Pre-cache common contexts

2. Recurring Tasks: Component creation (35% of requests)
   - Similar patterns with slight variations
   - Recommendation: Create template-based optimization

3. Context Inefficiency: Large context windows with low relevance
   - Average relevance: 0.42 (target: 0.70+)
   - Recommendation: Enable smart context pruning

4. Model Misallocation: Complex models for simple tasks
   - 23% of Haiku-suitable tasks using Sonnet
   - Potential savings: $28/month

ROI Analysis

Value Assessment

# Calculate return on investment
/analytics roi --calculate-value --time-saved --quality-improvement

# ROI Analysis Results:
Development Productivity Analysis (Last Quarter):

Time Savings:
- Code generation: 240 hours saved
- Code review: 120 hours saved  
- Documentation: 80 hours saved
- Debugging assistance: 160 hours saved
- Total: 600 hours saved

Cost Analysis:
- NikCLI token costs: $427.50
- Developer time value: $75/hour average
- Time value saved: $45,000

ROI Calculation:
- Investment: $427.50
- Return: $45,000
- ROI: 10,525% or 105x return

Quality Improvements:
- Code quality score: +23%
- Bug reduction: -34%
- Test coverage: +18%
- Documentation completeness: +45%

Best Practices

Token Efficiency Guidelines

Context Optimization

Keep context focused and relevant to the task

# Good: Focused context
/context set "src/components/auth/" --include "*.tsx,*.ts"

# Avoid: Over-broad context  
/context set "." --recursive --all-files

Model Selection

Choose the right model for each task complexity

# Simple tasks → Haiku
/model claude-3-haiku
"Fix this typo in the comment"

# Complex reasoning → Sonnet
/model claude-3-5-sonnet  
"Design architecture for microservices system"

Caching Strategy

Implement intelligent caching for repeated patterns

# Enable context and response caching
/config set cache.context.enabled true
/config set cache.responses.enabled true
/config set cache.similarity-threshold 0.85

Batch Processing

Group similar tasks for efficient processing

# Batch similar component requests
/agent react-expert "create button, input, and select components with consistent styling"

Monitoring Best Practices

Continuous Monitoring
Regular Reviews

# Set up comprehensive monitoring
/monitor enable --cost-tracking --usage-patterns --optimization-opportunities

# Configure alerts for various thresholds
/alerts create --type cost-spike --threshold 200% --period 1h
/alerts create --type unusual-usage --threshold 3-sigma
/alerts create --type optimization-opportunity --min-savings 10.00

Next Steps

Caching Systems

Deep dive into NikCLI’s caching mechanisms and optimization

Performance Troubleshooting

Troubleshoot and optimize performance issues

Configuration Guide

Advanced configuration options and tuning

Cost Optimization

Learn advanced cost optimization techniques

Start with basic optimizations like context pruning and smart model selection, then gradually implement advanced features like caching and batch processing. Monitor your token usage regularly to identify optimization opportunities.

Get Started

Agent System

Components A–Z

Advanced

Troubleshooting

Contributing

​Overview

Usage Optimization

Cost Control

Performance Tuning

Analytics & Monitoring

​Token Fundamentals

​Understanding Token Costs

​Token Optimization Configuration

​Context Optimization Strategies

​Smart Context Management

​Context Caching

​Cost Monitoring and Budgeting

​Usage Tracking

​Cost Analysis and Reporting

​Advanced Token Optimization

​Response Optimization

​Batch Processing

​Provider-Specific Optimizations

​Anthropic Claude Optimization

​OpenAI GPT Optimization

​Token Analytics and Insights

​Advanced Analytics

​ROI Analysis

​Best Practices

​Token Efficiency Guidelines

Context Optimization

Model Selection

Caching Strategy

Batch Processing

​Monitoring Best Practices

​Next Steps

Caching Systems

Performance Troubleshooting

Configuration Guide

Cost Optimization

Overview

Token Fundamentals

Understanding Token Costs

Token Optimization Configuration

Context Optimization Strategies

Smart Context Management

Context Caching

Cost Monitoring and Budgeting

Usage Tracking

Cost Analysis and Reporting

Advanced Token Optimization

Response Optimization

Batch Processing

Provider-Specific Optimizations

Anthropic Claude Optimization

OpenAI GPT Optimization

Token Analytics and Insights

Advanced Analytics

ROI Analysis

Best Practices

Token Efficiency Guidelines

Monitoring Best Practices

Next Steps