Token management is crucial for optimizing performance and controlling costs when using AI models in NikCLI. This guide covers strategies for efficient token usage, cost optimization, and monitoring across different AI providers.
Usage Optimization
Optimize token usage with smart context management and caching
Cost Control
Monitor and control costs across multiple AI providers
Performance Tuning
Balance quality and efficiency for optimal performance
Analytics & Monitoring
Track usage patterns and identify optimization opportunities
# Configure automatic model selection based on task complexity/config set ai.smart-model-selection true/config set ai.model-selection-strategy cost-optimized# Define model selection rules/config set ai.selection-rules '{ "simple-tasks": "claude-3-haiku", "code-generation": "claude-3-5-sonnet", "complex-reasoning": "claude-3-5-sonnet", "quick-questions": "claude-3-haiku"}'# Set cost thresholds/config set ai.cost-threshold.warning 0.10/config set ai.cost-threshold.max 1.00
Context Management
Copy
# Optimize context window usage/config set context.max-tokens 8000/config set context.optimization-strategy smart-pruning/config set context.relevance-threshold 0.7# Configure context compression/config set context.compression.enabled true/config set context.compression.algorithm semantic/config set context.compression.ratio 0.3# Set up context caching/config set cache.context.enabled true/config set cache.context.ttl 3600 # 1 hour/config set cache.similarity-threshold 0.85
# Control response length and format/config set ai.response.max-tokens 2000/config set ai.response.prefer-concise true/config set ai.response.avoid-repetition true# Configure output formatting/config set ai.response.format-code true/config set ai.response.include-explanations selective/config set ai.response.verbosity medium# Task-specific optimizations/agent react-expert "create component" --concise --code-only/agent universal-agent "explain concept" --detailed --educational
# Optimize for Claude's strengths/config set anthropic.use-xml-formatting true/config set anthropic.structured-output true/config set anthropic.thinking-blocks false # Reduce token usage# Claude cost optimization/config set anthropic.prefer-haiku-for-simple true/config set anthropic.sonnet-threshold-complexity 0.7/config set anthropic.context-optimization claude-specific
# Optimize for GPT models/config set openai.use-function-calling true/config set openai.structured-output true/config set openai.json-mode-when-applicable true# Model selection optimization/config set openai.use-35-turbo-for-simple true/config set openai.gpt4-threshold-complexity 0.8
# Good: Focused context/context set "src/components/auth/" --include "*.tsx,*.ts"# Avoid: Over-broad context /context set "." --recursive --all-files
Model Selection
Choose the right model for each task complexity
Copy
# Simple tasks → Haiku/model claude-3-haiku"Fix this typo in the comment"# Complex reasoning → Sonnet/model claude-3-5-sonnet"Design architecture for microservices system"
Caching Strategy
Implement intelligent caching for repeated patterns
Copy
# Enable context and response caching/config set cache.context.enabled true/config set cache.responses.enabled true/config set cache.similarity-threshold 0.85
Batch Processing
Group similar tasks for efficient processing
Copy
# Batch similar component requests/agent react-expert "create button, input, and select components with consistent styling"
Start with basic optimizations like context pruning and smart model selection, then gradually implement advanced features like caching and batch processing. Monitor your token usage regularly to identify optimization opportunities.