Taxonomy Module CLI Usage Guide
The Compileo Taxonomy CLI provides comprehensive command-line tools for taxonomy management, including creation, generation, extension, and retrieval. This guide covers all available CLI commands with examples and best practices.
Command Overview
graph TD
A[Taxonomy CLI] --> B[List]
A --> C[Create]
A --> D[Generate]
A --> E[Extend]
A --> F[Load]
A --> G[Update]
A --> H[Delete]
A --> I[Bulk Delete]
B --> B1[All taxonomies]
C --> C1[Manual taxonomy]
D --> D1[AI generation]
E --> E1[Extend existing]
F --> F1[View details]
G --> G1[Update name]
H --> H1[Single delete]
I --> I1[Multiple delete]
Taxonomy Listing
List all available taxonomies with optional project filtering:
Parameters:
- --project-id: Filter by project ID (optional)
- --format: Output format (table, json) (default: table)
Example Output:
๐ Taxonomies in project 1:
โโโโโโฌโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโ
โ ID โ Name โ Description โ Categories โ Confidence โ
โโโโโโผโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโค
โ 101โ Medical Conditions โ Classification of conditions โ 45 โ 0.85 โ
โ 102โ AI Generated โ AI-generated taxonomy โ 78 โ 0.82 โ
โโโโโโดโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโ
Manual Taxonomy Creation
Create a new taxonomy from a JSON file:
compileo taxonomy create --project-id 1 --name "Medical Conditions" --description "Classification system" --file taxonomy.json
Parameters:
- --project-id: Project ID for the taxonomy (required)
- --name: Taxonomy name (required)
- --description: Taxonomy description (optional)
- --file: JSON file containing taxonomy structure (optional)
Taxonomy JSON Structure:
{
"name": "Medical Conditions",
"description": "Hierarchical classification",
"children": [
{
"name": "Cardiovascular",
"description": "Heart conditions",
"confidence_threshold": 0.8,
"children": [
{
"name": "Coronary Artery Disease",
"description": "Artery blockage",
"confidence_threshold": 0.85,
"children": []
}
]
}
]
}
Example Output:
โ
Taxonomy created successfully!
๐ Taxonomy ID: 101
๐ File: storage/taxonomy/1/manual_taxonomy_uuid.json
๐ท๏ธ Categories: 3
๐ฏ Confidence: 0.8
AI Taxonomy Generation
Generate a new taxonomy using AI from document chunks:
compileo taxonomy generate --project-id 1 --name "AI Medical Taxonomy" --documents 101,102,103 --depth 3 --generator gemini --domain medical --batch-size 10 --category-limits 5,10,15 --specificity-level 2
Parameters:
- --project-id: Project containing documents (required)
- --name: Taxonomy name (required)
- --documents: Comma-separated document IDs (required)
- --depth: Taxonomy hierarchy depth (default: 3)
- --generator: AI model (gemini, grok, ollama, openai) (default: gemini)
- --domain: Content domain (default: general)
- --batch-size: Number of complete chunks to process (default: 10)
- --category-limits: Max categories per level (comma-separated)
- --specificity-level: Specificity level 1-5 (default: 1)
Example Output:
๐ค Generating taxonomy with gemini...
๐ Analyzing 3 documents (100 chunks)
๐ฏ Domain: medical, Depth: 3
โณ Generation in progress...
โ
Taxonomy generated successfully!
๐ Taxonomy ID: 102
๐ท๏ธ Categories: 45
๐ฏ Confidence: 0.85
๐ File: storage/taxonomy/1/ai_taxonomy_uuid.json
Taxonomy Extension
Extend an existing taxonomy with additional hierarchy levels:
compileo taxonomy extend --taxonomy-data taxonomy.json --project-id 1 --additional-depth 2 --generator gemini --domain medical --batch-size 10
Parameters:
- --taxonomy-data: JSON file with taxonomy/category data (optional)
- --project-id: Project ID (required if using taxonomy-data)
- --additional-depth: Levels to add (default: 2)
- --generator: AI model (default: gemini)
- --domain: Content domain (default: general)
- --batch-size: Number of complete chunks to process (optional)
- --documents: Comma-separated list of document IDs to analyze (optional)
- --processing-mode: Processing mode (fast or complete) (default: fast)
- fast: Quick processing with sampling (default)
- complete: Comprehensive processing of all content
Alternative: Extend by taxonomy ID
Example Output:
๐ Extending taxonomy 102...
๐ Adding 1 additional level
โณ Extension in progress...
โ
Taxonomy extended successfully!
๐ New categories: 78 (was 45)
๐ฏ Updated confidence: 0.82
Taxonomy Viewing
Load detailed information about a specific taxonomy:
Parameters:
- taxonomy_id: Taxonomy ID to load (required)
- --format: Output format (json, text) (default: json)
- --output: Save to file instead of displaying (optional)
Example Output:
{
"taxonomy": {
"name": "Medical Conditions",
"description": "Classification system",
"children": [...]
},
"metadata": {
"type": "manual",
"confidence_score": 0.8,
"created_manually": true
},
"analytics": {
"depth_analysis": {
"total_categories": 45,
"max_depth": 3
}
}
}
Taxonomy Management
Update Taxonomy
Update taxonomy information:
Parameters:
- taxonomy_id: Taxonomy ID to update (required)
- --name: New taxonomy name (required)
Delete Taxonomy
Remove a taxonomy. This operation performs a complete cleanup, removing the taxonomy's file from the filesystem and deleting all associated extraction jobs and their results (both database entries and filesystem files).
Parameters:
- taxonomy_id: Taxonomy ID to delete (required)
- --confirm: Skip confirmation prompt (flag)
Bulk Delete Taxonomies
Delete multiple taxonomies at once. This operation performs a complete cleanup for all specified taxonomies, removing their files from the filesystem and deleting all associated extraction jobs and their results (both database entries and filesystem files).
Parameters:
- --taxonomy-ids: Comma-separated taxonomy IDs (required)
- --confirm: Skip confirmation prompt (flag)
Advanced Usage Examples
Complete Taxonomy Workflow
#!/bin/bash
# Complete taxonomy creation and management workflow
PROJECT_ID=1
DOC_IDS="101,102,103"
echo "๐ Starting taxonomy workflow..."
# 1. Generate AI taxonomy
echo "๐ค Generating AI taxonomy..."
compileo taxonomy generate \
--project-id $PROJECT_ID \
--name "Medical Knowledge Base" \
--documents $DOC_IDS \
--depth 3 \
--generator gemini \
--domain medical \
--batch-size 10 \
--category-limits 5,10,15
# 2. Extend with additional depth
echo "๐ Extending taxonomy..."
compileo taxonomy extend \
--taxonomy-id $(compileo taxonomy list --project-id $PROJECT_ID --format json | jq -r '.taxonomies[0].id') \
--additional-depth 1 \
--generator gemini
# 3. Backup taxonomy
echo "๐พ Creating backup..."
compileo taxonomy load $(compileo taxonomy list --project-id $PROJECT_ID --format json | jq -r '.taxonomies[0].id') \
--output medical_taxonomy_backup.json
echo "โ
Taxonomy workflow completed!"
Batch Taxonomy Generation
#!/bin/bash
# Generate taxonomies for multiple domains
PROJECT_ID=1
DOMAINS=("medical" "legal" "technical")
DOC_IDS="101,102,103,104,105"
for domain in "${DOMAINS[@]}"; do
echo "๐๏ธ Generating $domain taxonomy..."
compileo taxonomy generate \
--project-id $PROJECT_ID \
--name "${domain^} Classification" \
--documents $DOC_IDS \
--depth 3 \
--generator gemini \
--domain $domain \
--batch-size 10 \
--category-limits 5,8,12
done
echo "๐ Generated taxonomies for: ${DOMAINS[*]}"
Taxonomy Quality Assessment
#!/bin/bash
# Assess taxonomy quality and cleanup
PROJECT_ID=1
echo "๐ Assessing taxonomy quality..."
# List all taxonomies with confidence scores
compileo taxonomy list --project-id $PROJECT_ID --format json | jq -r '.taxonomies[] | "\(.id): \(.name) - Confidence: \(.confidence_score)"'
# Remove low-confidence taxonomies
LOW_CONFIDENCE=$(compileo taxonomy list --project-id $PROJECT_ID --format json | jq -r '.taxonomies[] | select(.confidence_score < 0.7) | .id' | tr '\n' ',' | sed 's/,$//')
if [ ! -z "$LOW_CONFIDENCE" ]; then
echo "๐๏ธ Removing low-confidence taxonomies: $LOW_CONFIDENCE"
compileo taxonomy bulk-delete --taxonomy-ids $LOW_CONFIDENCE --confirm
fi
echo "โ
Quality assessment completed!"
Integration with Scripts
Python Automation
import subprocess
import json
import time
def create_taxonomy_workflow(project_id, document_ids, name, domain="general"):
"""Complete taxonomy creation workflow."""
# Generate AI taxonomy
cmd = [
"compileo", "taxonomy", "generate",
"--project-id", str(project_id),
"--name", name,
"--documents", ",".join(map(str, document_ids)),
"--depth", "3",
"--generator", "gemini",
"--domain", domain,
"--sample-size", "100"
]
result = subprocess.run(cmd, capture_output=True, text=True)
if result.returncode != 0:
raise Exception(f"Taxonomy generation failed: {result.stderr}")
# Extract taxonomy ID from output
# (Implementation would parse the output to get taxonomy ID)
# Extend taxonomy
extend_cmd = [
"compileo", "taxonomy", "extend",
"--taxonomy-id", "102", # Would be extracted from generation output
"--additional-depth", "1",
"--generator", "gemini"
]
extend_result = subprocess.run(extend_cmd, capture_output=True, text=True)
if extend_result.returncode != 0:
raise Exception(f"Taxonomy extension failed: {extend_result.stderr}")
return True
# Usage
try:
success = create_taxonomy_workflow(
project_id=1,
document_ids=[101, 102, 103],
name="Medical Taxonomy",
domain="medical"
)
if success:
print("Taxonomy workflow completed successfully!")
except Exception as e:
print(f"Error: {e}")
Taxonomy Comparison Script
import subprocess
import json
def compare_taxonomies(project_id, taxonomy_ids):
"""Compare multiple taxonomies."""
taxonomies = {}
for tax_id in taxonomy_ids:
# Load taxonomy details
cmd = ["compileo", "taxonomy", "load", str(tax_id), "--format", "json"]
result = subprocess.run(cmd, capture_output=True, text=True)
if result.returncode == 0:
tax_data = json.loads(result.stdout)
taxonomies[tax_id] = {
'name': tax_data['taxonomy']['name'],
'categories': tax_data['analytics']['depth_analysis']['total_categories'],
'depth': tax_data['analytics']['depth_analysis']['max_depth'],
'confidence': tax_data['metadata']['confidence_score']
}
# Print comparison
print("๐ Taxonomy Comparison:")
print("-" * 60)
for tax_id, data in taxonomies.items():
print(f"ID {tax_id}: {data['name']}")
print(f" Categories: {data['categories']}, Depth: {data['depth']}, Confidence: {data['confidence']}")
print()
# Usage
compare_taxonomies(1, [101, 102, 103])
Best Practices
Taxonomy Generation
Document Selection: - Choose documents with diverse content for better taxonomy coverage - Ensure documents are parsed before taxonomy generation - Use representative samples (100-200 chunks) for optimal results
Parameter Optimization:
# Medical domain taxonomy
compileo taxonomy generate \
--project-id 1 \
--documents 101,102,103 \
--depth 3 \
--domain medical \
--category-limits 5,10,15 \
--specificity-level 2
# Technical documentation
compileo taxonomy generate \
--project-id 1 \
--documents 201,202 \
--depth 4 \
--domain technical \
--category-limits 3,8,12,20 \
--specificity-level 1
Taxonomy Extension
Incremental Growth:
# Add one level at a time for better control
compileo taxonomy extend --taxonomy-id 101 --additional-depth 1 --generator gemini
# Use domain-specific extension with selected documents
compileo taxonomy extend --taxonomy-id 101 --additional-depth 2 --domain medical --generator gemini --documents 101,102
Quality Management
Regular Assessment:
# Check taxonomy health
compileo taxonomy list --project-id 1 --format json | jq '.taxonomies[] | select(.confidence_score < 0.8)'
# Archive old taxonomies
compileo taxonomy bulk-delete --taxonomy-ids $(compileo taxonomy list --project-id 1 --format json | jq -r '.taxonomies[] | select(.created_at < "2024-01-01") | .id' | tr '\n' ',')
Backup and Recovery
Regular Backups:
# Backup all taxonomies
for tax_id in $(compileo taxonomy list --project-id 1 --format json | jq -r '.taxonomies[].id'); do
compileo taxonomy load $tax_id --output "backup_taxonomy_${tax_id}.json"
done
# Restore from backup
compileo taxonomy create --project-id 1 --file backup_taxonomy_101.json
Error Handling
Common Issues
Missing Documents:
# Error: No chunks found for selected documents
compileo taxonomy generate --project-id 1 --documents 999
# Solution: Check document IDs and ensure they are parsed
compileo documents list --project-id 1
API Key Issues:
# Error: API key not configured
compileo taxonomy generate --project-id 1 --documents 101 --generator gemini
# Solution: Ensure API keys are set in configuration
Invalid JSON:
# Error: Invalid taxonomy file format
compileo taxonomy create --project-id 1 --file invalid.json
# Solution: Validate JSON structure before use
Permission Issues:
# Error: Cannot write to taxonomy directory
compileo taxonomy generate --project-id 1 --documents 101
# Solution: Check file system permissions
Performance Optimization
Large Taxonomy Operations
# Use smaller sample sizes for faster generation
compileo taxonomy generate \
--project-id 1 \
--documents 101,102,103 \
--batch-size 10 \
--generator gemini
# Process taxonomies in batches
compileo taxonomy extend --taxonomy-id 101 --additional-depth 1 --sample-size 25
Memory Management
# Limit concurrent operations
# Use appropriate chunk sizes for your system
compileo taxonomy generate \
--project-id 1 \
--documents 101 \
--sample-size 50 \
--category-limits 3,5,8
This CLI provides comprehensive taxonomy management capabilities with support for manual creation, AI generation, extension, and full lifecycle management suitable for both interactive use and automated workflows.