Documentation Index Fetch the complete documentation index at: https://nikcli.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Workspace Indexing
NikCLI’s workspace indexing system intelligently analyzes your codebase to build a searchable knowledge base. It combines file filtering, language detection, importance scoring, and vector embeddings to create an efficient context retrieval system.
How It Works
1. File Discovery & Filtering
The system scans your workspace and applies intelligent filtering:
import { WorkspaceContextManager } from '@nicomatt69/nikcli' ;
const workspace = new WorkspaceContextManager ( process . cwd ());
// Automatic filtering applied:
// ✓ Respects .gitignore
// ✓ Excludes node_modules, dist, build
// ✓ Filters by file size (default: 1MB limit)
// ✓ Detects binary files
// ✓ Applies custom rules
await workspace . refreshWorkspaceIndex ();
Default Exclusions :
const excludedDirectories = [
'node_modules' ,
'dist' ,
'build' ,
'.next' ,
'.cache' ,
'.git' ,
'coverage' ,
'__pycache__'
];
const excludedExtensions = [
'.jpg' , '.jpeg' , '.png' , '.gif' , '.svg' ,
'.pdf' , '.zip' , '.tar' , '.gz' ,
'.mp4' , '.avi' , '.mov' ,
'.exe' , '.dll' , '.so'
];
2. Language & Framework Detection
Automatic detection of languages and frameworks:
// Detected from file extensions
const languageMap = {
'.ts' : 'typescript' ,
'.tsx' : 'typescript' ,
'.js' : 'javascript' ,
'.jsx' : 'javascript' ,
'.py' : 'python' ,
'.go' : 'go' ,
'.rs' : 'rust' ,
'.java' : 'java' ,
// ... 40+ languages supported
};
// Framework detection from package.json
const frameworks = {
'next' : 'Next.js' ,
'react' : 'React' ,
'vue' : 'Vue.js' ,
'express' : 'Express' ,
'fastify' : 'Fastify' ,
// ... many more
};
3. File Analysis
Each file is analyzed to extract:
interface FileContext {
path : string ;
content : string ;
size : number ;
modified : Date ;
language : string ;
importance : number ; // 0-100 score
// Extracted metadata
summary ?: string ;
dependencies ?: string []; // import statements
exports ?: string []; // exported symbols
functions ?: string []; // function names
classes ?: string []; // class names
types ?: string []; // type/interface names
tags ?: string []; // categorization tags
// Performance optimization
hash ?: string ; // Content hash for change detection
embedding ?: number []; // Vector embedding
lastAnalyzed ?: Date ;
}
4. Importance Scoring
Files are scored based on multiple factors:
function calculateFileImportance ( file : FileContext ) : number {
let score = 50 ; // Base score
// Path-based scoring
if ( isEntryPoint ( file . path )) score += 25 ; // index.ts, main.ts
if ( isConfig ( file . path )) score += 20 ; // package.json, tsconfig.json
if ( inSourceDir ( file . path )) score += 15 ; // src/, lib/
if ( isTest ( file . path )) score -= 10 ; // test files lower priority
// Content-based scoring
score += Math . min ( file . exports . length * 5 , 25 ); // Has exports
score += Math . min ( file . functions . length * 2 , 20 ); // Has functions
score += Math . min ( file . classes . length * 3 , 15 ); // Has classes
// Size-based scoring
const lines = file . content . split ( ' \n ' ). length ;
if ( lines > 100 ) score += 5 ;
if ( lines > 500 ) score += 10 ;
return Math . min ( 100 , Math . max ( 0 , score ));
}
Importance Categories :
90-100 : Entry points, core configuration
70-89 : Main source files, important modules
50-69 : Regular source files
30-49 : Utilities, helpers
0-29 : Tests, documentation, generated files
5. Vector Embedding Generation
Files are chunked and embedded for semantic search:
// Intelligent chunking preserves context
const chunks = intelligentChunking ( file . content , file . language );
// Code chunking (TypeScript example)
function chunkCodeFile ( content : string ) : string [] {
// Keeps functions/classes together
// Respects bracket depth
// Smart overlap at function boundaries
// Typically 80-150 lines per chunk
}
// Markdown chunking
function chunkMarkdownFile ( content : string ) : string [] {
// Splits by headers
// Preserves hierarchy
// Maintains cross-references
// Minimum 200 chars per section
}
// Generate embeddings
for ( const chunk of chunks ) {
const embedding = await unifiedEmbeddingInterface . generateEmbedding ( chunk );
await vectorStore . addDocument ({
id: ` ${ file . path } # ${ chunkIndex } ` ,
content: chunk ,
embedding: embedding . vector ,
metadata: {
source: file . path ,
language: file . language ,
importance: file . importance ,
chunkIndex ,
totalChunks: chunks . length
}
});
}
Indexing Strategies
Full Workspace Index
Index entire workspace:
import { unifiedRAGSystem } from '@nicomatt69/nikcli' ;
// Analyze and index full workspace
const analysis = await unifiedRAGSystem . analyzeProject ( process . cwd ());
console . log ({
indexedFiles: analysis . indexedFiles ,
cost: `$ ${ analysis . embeddingsCost . toFixed ( 4 ) } ` ,
time: ` ${ analysis . processingTime } ms` ,
vectorDB: analysis . vectorDBStatus
});
// Example output:
// {
// indexedFiles: 342,
// cost: '$0.0234',
// time: '12450ms',
// vectorDB: 'available'
// }
Selective Indexing
Index specific paths:
import { WorkspaceContextManager } from '@nicomatt69/nikcli' ;
const workspace = new WorkspaceContextManager ();
// Select specific paths to index
await workspace . selectPaths ([
'src/core' ,
'src/agents' ,
'src/tools' ,
'README.md'
]);
// Only selected paths will be indexed
const context = workspace . getContext ();
console . log ( `Indexed ${ context . files . size } files from selected paths` );
Incremental Updates
Only re-index changed files:
// File change detection via hash
const fileHash = generateFileHash ( filePath , content );
if ( cachedHash !== fileHash ) {
// File changed, re-index
await analyzeFile ( filePath , content );
updateCache ( filePath , fileHash );
} else {
// File unchanged, use cached analysis
const cached = getCachedAnalysis ( filePath );
}
Configuration
File Filter Options
import { createFileFilter } from '@nicomatt69/nikcli' ;
const fileFilter = createFileFilter ( process . cwd (), {
// Respect .gitignore
respectGitignore: true ,
// Size limits
maxFileSize: 1024 * 1024 , // 1MB per file
maxTotalFiles: 1000 ,
// Include/exclude
includeExtensions: [ '.ts' , '.js' , '.tsx' , '.jsx' , '.py' ],
excludeExtensions: [ '.test.ts' , '.spec.ts' ],
excludeDirectories: [ 'node_modules' , 'dist' , 'build' ],
excludePatterns: [ '**/*.generated.ts' , '**/vendor/**' ],
// Custom rules
customRules: [
{
name: 'priority_configs' ,
pattern: / \. ( json | yaml | yml | toml ) $ / ,
type: 'include' ,
priority: 10 ,
reason: 'Important configuration files'
},
{
name: 'skip_tests' ,
pattern: / \. ( test | spec ) \. ( ts | js | tsx | jsx ) $ / ,
type: 'exclude' ,
priority: 8 ,
reason: 'Test files have lower priority'
}
]
});
// Check if file should be indexed
const result = fileFilter . shouldIncludeFile ( filePath , rootPath );
if ( result . allowed ) {
await indexFile ( filePath );
}
Chunking Configuration
import { TOKEN_LIMITS } from '@nicomatt69/nikcli' ;
// Configure chunk sizes
const config = {
// Token-based chunking
chunkTokens: TOKEN_LIMITS . RAG ?. CHUNK_TOKENS ?? 700 ,
overlapTokens: TOKEN_LIMITS . RAG ?. CHUNK_OVERLAP_TOKENS ?? 80 ,
// Code-specific
codeChunkMinLines: TOKEN_LIMITS . RAG ?. CODE_CHUNK_MIN_LINES ?? 80 ,
codeChunkMaxLines: TOKEN_LIMITS . RAG ?. CODE_CHUNK_MAX_LINES ?? 150 ,
// Markdown-specific
markdownMinSection: TOKEN_LIMITS . RAG ?. MARKDOWN_MIN_SECTION ?? 200 ,
};
unifiedRAGSystem . updateConfig ( config );
Cost Management
// Set indexing cost threshold
unifiedRAGSystem . updateConfig ({
costThreshold: 0.10 // Stop if exceeds $0.10
});
// Estimate costs before indexing
const files = await glob ( '**/*.{ts,js}' );
const estimatedCost = await estimateIndexingCost ( files , process . cwd ());
if ( estimatedCost > 0.10 ) {
console . warn ( `Estimated cost: $ ${ estimatedCost . toFixed ( 4 ) } ` );
console . warn ( 'Consider reducing scope or using selective indexing' );
}
Monitoring & Optimization
Index Statistics
const workspace = new WorkspaceContextManager ();
const stats = workspace . getPerformanceStats ();
console . log ({
totalFiles: stats . totalFiles ,
totalDirectories: stats . totalDirectories ,
cacheStats: {
hits: stats . cacheStats . hits ,
misses: stats . cacheStats . misses ,
hitRate: ` ${ (( stats . cacheStats . hits / ( stats . cacheStats . hits + stats . cacheStats . misses )) * 100 ). toFixed ( 1 ) } %`
},
cacheSize: {
semanticSearch: stats . cacheSize . semanticSearch ,
fileContent: stats . cacheSize . fileContent ,
embeddings: stats . cacheSize . embeddings ,
analysis: stats . cacheSize . analysis
},
ragAvailable: stats . ragAvailable ,
lastUpdated: stats . lastUpdated
});
Cache Management
// Clear all caches
workspace . clearAllCaches ();
// Optimize cache (remove old entries)
await workspace . optimizeCache ();
// Manual cache cleanup
setInterval ( async () => {
await workspace . optimizeCache ();
}, 3600000 ); // Every hour
Watch Mode
Monitor file changes and re-index automatically:
// Start watching for changes
workspace . startWatching ();
// Files are automatically re-analyzed when changed
// Debounced to 1 second to avoid excessive re-indexing
// Stop watching
workspace . stopWatching ();
Best Practices
1. Optimize Index Scope
// Instead of indexing everything:
// await unifiedRAGSystem.analyzeProject(process.cwd());
// Index only source code:
const workspace = new WorkspaceContextManager ();
await workspace . selectPaths ([
'src' ,
'lib' ,
'package.json' ,
'tsconfig.json' ,
'README.md'
]);
2. Use Appropriate Filters
const fileFilter = createFileFilter ( process . cwd (), {
// Include only code files
includeExtensions: [
'.ts' , '.tsx' , '.js' , '.jsx' , // JavaScript/TypeScript
'.py' , // Python
'.go' , // Go
'.rs' // Rust
],
// Exclude test files
excludePatterns: [
'**/*.test.*' ,
'**/*.spec.*' ,
'**/__tests__/**' ,
'**/__mocks__/**'
]
});
3. Leverage Caching
// Enable all caching
process . env . CACHE_RAG = 'true' ;
process . env . CACHE_AI = 'true' ;
// Embeddings cached for 24 hours
// Analysis cached for 5 minutes
// File hashes cached for 7 days
4. Monitor Costs
// Track embedding costs
const analysis = await unifiedRAGSystem . analyzeProject ( process . cwd ());
console . log ( `Indexing cost: $ ${ analysis . embeddingsCost . toFixed ( 4 ) } ` );
// Use local-only mode if needed
unifiedRAGSystem . updateConfig ({
useVectorDB: false , // Disable vector DB
useLocalEmbeddings: true , // Use simple TF-IDF
hybridMode: false
});
Troubleshooting
High Indexing Costs
# Problem: Indexing costs too high
# Solution: Reduce scope and enable local embeddings
# 1. Selective indexing
nikcli --index-paths "src,lib"
# 2. Use local embeddings
export USE_LOCAL_EMBEDDINGS = true
# 3. Set cost limit
export INDEXING_COST_THRESHOLD = 0.05
Large Workspaces
# Problem: Workspace too large
# Solution: Increase limits or use selective indexing
# 1. Increase file limit
export MAX_INDEX_FILES = 2000
# 2. Increase file size limit
export MAX_FILE_SIZE_MB = 2
# 3. Use selective paths
nikcli --index-paths "src/core,src/agents"
Slow Indexing
// Problem: Indexing takes too long
// Solution: Optimize batch sizes and use caching
unifiedRAGSystem . updateConfig ({
indexingBatchSize: 500 , // Larger batches
embedBatchSize: 100 , // Parallel embedding generation
});
// Enable caching
process . env . CACHE_RAG = 'true' ;
Semantic Search Advanced search capabilities
Embeddings Embedding configuration
Token Management Optimize token usage
Cache System Performance optimization