Skip to content

Taxonomy Module GUI Usage Guide

The Compileo Taxonomy GUI provides an intuitive web interface for taxonomy management, including creation, generation, extension, and content extraction. This guide covers all GUI features with step-by-step instructions.

Accessing Taxonomy Builder

Navigate to the "๐Ÿท๏ธ Taxonomy Builder" page from the main menu. The interface is organized into three main tabs:

  1. ๐Ÿ—๏ธ Build Taxonomy - Create and edit taxonomies
  2. ๐Ÿ“ค Extraction - Extract content using taxonomies
  3. ๐Ÿ“‹ Browse & Manage Taxonomies - View and manage existing taxonomies

Tab 1: Build Taxonomy

Unified Taxonomy Builder

The main taxonomy building interface combines manual editing with AI assistance:

Manual Taxonomy Creation

  1. Start New Taxonomy:
  2. Click "Create New Taxonomy"
  3. Enter taxonomy name and description
  4. Select project association

  5. Add Root Categories:

  6. Click "Add Category" to create top-level categories
  7. Enter category name and description
  8. Set confidence threshold (0.0-1.0)

  9. Build Hierarchy:

  10. Click on any category to expand
  11. Add subcategories with "Add Subcategory"
  12. Drag and drop to reorganize structure
  13. Delete categories with confirmation

  14. Import/Export:

  15. Import taxonomy from JSON file
  16. Export current taxonomy structure
  17. Validate taxonomy structure before saving

AI-Assisted Generation

  1. AI Generation Setup:
    • Select "Generate with AI" mode
    • Choose AI model (Gemini, Grok, Ollama)
    • Select source documents (must be parsed)

Ollama Generator Configuration: When using Ollama for taxonomy generation, you can fine-tune AI behavior by configuring parameters in Settings โ†’ AI Model Configuration. Available parameters include temperature, repeat penalty, top-p, top-k, and num_predict for optimal taxonomy generation results.

  1. Generation Parameters:
  2. Domain: Content domain (medical, legal, technical, general)
  3. Processing Mode:
    • Fast (Sampled): Quickly generates taxonomy from a sample of up to 10 chunks.
    • Complete (All Content): Iteratively processes every chunk in the document for comprehensive coverage.
  4. Depth: Hierarchy levels (1-5)
  5. Chunk Batch Size: Number of complete chunks to process per batch (1-50)
  6. Category Limits: Max categories per level
  7. Specificity Level: Detail level (1-5)

  8. Generation Process:

  9. Click "Generate Taxonomy"
  10. Monitor progress in real-time
  11. Review generated structure
  12. Edit manually if needed
  13. Save final taxonomy

Taxonomy Extension

  1. Extend Existing Taxonomy:
  2. Select taxonomy to extend
  3. Choose extension method:

    • Add Levels: Add depth to entire taxonomy
    • Expand Category: Extend specific category
    • Refine Existing: Improve existing categories
  4. Extension Parameters:

  5. Additional depth levels
  6. AI model selection
  7. Domain specification
  8. Sample size adjustment

  9. Review and Apply:

  10. Preview extension results
  11. Accept or modify changes
  12. Save extended taxonomy

Tab 2: Extraction

Content Classification Setup

  1. Select Taxonomy:
  2. Choose taxonomy for extraction
  3. View taxonomy structure preview
  4. Select specific categories or use entire taxonomy

  5. Document Selection:

  6. Choose project containing documents
  7. Select individual documents or all documents
  8. Filter by document status (parsed, chunked)

  9. Extraction Parameters:

  10. Confidence Threshold: Minimum classification confidence (0.0-1.0)
  11. Max Chunks: Limit processing volume
  12. Validation Stage: Enable two-stage classification
  13. Primary Classifier: Main AI model
  14. Validation Classifier: Secondary model for validation

Extraction Process

  1. Start Extraction:
  2. Click "Start Extraction Job"
  3. Monitor progress with real-time updates
  4. View processing statistics

  5. Results Review:

  6. Browse extracted content by category
  7. Filter results by confidence score
  8. Export results to various formats
  9. Generate summary reports

  10. Quality Assessment:

  11. View classification accuracy metrics
  12. Identify low-confidence classifications
  13. Re-run extraction with adjusted parameters

Advanced Features

Batch Extraction: - Process multiple documents simultaneously - Queue extraction jobs for background processing - Monitor multiple jobs in job dashboard

Incremental Extraction: - Extract from new documents only - Update existing extractions - Merge results across multiple runs


Tab 3: Browse & Manage Taxonomies

Taxonomy Browser

  1. Taxonomy List:
  2. View all taxonomies in selected project
  3. Sort by name, creation date, confidence score
  4. Filter by taxonomy type (manual, AI-generated)

  5. Taxonomy Details:

  6. Click taxonomy name to view full structure
  7. Expand/collapse hierarchy levels
  8. View category statistics and confidence scores
  9. Export taxonomy structure

  10. Analytics Dashboard:

  11. Depth Analysis: Hierarchy depth and distribution
  12. Category Count: Total categories and distribution
  13. Confidence Metrics: Average and distribution of confidence scores
  14. Usage Statistics: Extraction jobs and results

Taxonomy Management

Edit Taxonomy

  1. Modify Structure:
  2. Add, remove, or rename categories
  3. Reorganize hierarchy with drag-and-drop
  4. Update category descriptions and confidence thresholds

  5. Bulk Operations:

  6. Import categories from CSV/JSON
  7. Export selected branches
  8. Clone taxonomy structure

Delete Taxonomy

  1. Safe Deletion:

    • Confirmation prompts prevent accidents
    • Complete Cleanup: Deleting a taxonomy automatically removes its file from the filesystem and cleans up all associated extraction jobs and their results (both database entries and filesystem files).
    • Check for dependent extraction jobs
    • Option to archive instead of delete
  2. Bulk Management:

  3. Select multiple taxonomies for deletion
  4. Filter by criteria (old, low-confidence, unused)
  5. Batch operation confirmation

Taxonomy Comparison

  1. Side-by-Side View:
  2. Compare two taxonomies visually
  3. Highlight differences in structure
  4. Merge compatible branches

  5. Metrics Comparison:

  6. Compare depth, category count, confidence scores
  7. View overlap analysis
  8. Generate comparison reports

Advanced GUI Features

State Management

Session Persistence: - Current selections remembered across page refreshes - Unsaved changes protected with confirmation prompts - Progress tracking for long-running operations

Navigation States: - Seamless transitions between views - Breadcrumb navigation for deep taxonomy editing - Back/forward navigation support

Real-time Updates

Live Progress: - Real-time progress bars for generation and extraction - Live statistics updates during processing - Instant feedback on parameter changes

Collaborative Features: - Lock mechanism for concurrent editing - Change notifications for shared taxonomies - Version history tracking

Keyboard Shortcuts

  • Ctrl+S: Save current taxonomy
  • Ctrl+Z: Undo last change
  • Ctrl+Y: Redo last change
  • Delete: Remove selected category
  • Enter: Add subcategory to selected item

Best Practices

Taxonomy Design

Structure Guidelines: - Start with 3-4 levels maximum for usability - Use clear, descriptive category names - Maintain consistent naming conventions - Set appropriate confidence thresholds

Quality Assurance: - Regularly review and update taxonomy structure - Test extraction accuracy on sample documents - Maintain version history for important taxonomies - Document taxonomy purpose and scope

Performance Optimization

Large Taxonomies: - Use category limits to control growth - Implement pagination for deep hierarchies - Consider splitting very large taxonomies

Processing Efficiency: - Select appropriate sample sizes for generation - Use incremental extraction for updates - Monitor and optimize confidence thresholds

Maintenance Workflows

Regular Maintenance:

# Monthly taxonomy review checklist
- [ ] Review extraction accuracy metrics
- [ ] Update category descriptions
- [ ] Remove unused categories
- [ ] Test on new document types
- [ ] Archive outdated taxonomies

Version Control: - Create backups before major changes - Use descriptive names for taxonomy versions - Document changes and rationale - Maintain changelog for important taxonomies


Troubleshooting

Common Issues

Generation Failures: - Check document parsing status - Verify API key configuration - Ensure sufficient document content - Try different AI models

Extraction Problems: - Validate taxonomy structure - Check confidence threshold settings - Review document chunking quality - Monitor API rate limits

Performance Issues: - Reduce sample sizes for large document sets - Use category limits to control taxonomy size - Implement pagination for large result sets

UI Responsiveness: - Clear browser cache and cookies - Check internet connection stability - Close unused browser tabs - Update browser to latest version


Integration Examples

Workflow Automation

Document Processing Pipeline: 1. Upload and parse documents 2. Generate or select taxonomy 3. Configure extraction parameters 4. Run batch extraction jobs 5. Review and export results

Quality Assurance Process: 1. Create test taxonomy with known categories 2. Run extraction on labeled test documents 3. Compare results with expected classifications 4. Adjust parameters and re-run tests 5. Validate on production documents

API Integration

Programmatic Taxonomy Management:

// Create taxonomy via API
const taxonomy = await fetch('/api/v1/taxonomy/', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    name: 'Medical Conditions',
    project_id: 1,
    taxonomy: taxonomyStructure
  })
});

// Generate taxonomy
const generation = await fetch('/api/v1/taxonomy/generate', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    project_id: 1,
    documents: [101, 102, 103],
    generator: 'gemini',
    domain: 'medical'
  })
});

Real-time Updates:

// WebSocket connection for live updates
const ws = new WebSocket('ws://localhost:8000/ws/taxonomy');

ws.onmessage = (event) => {
  const update = JSON.parse(event.data);
  if (update.type === 'extraction_progress') {
    updateProgressBar(update.progress);
  }
};

This GUI provides a comprehensive taxonomy management system with intuitive interfaces for creation, editing, generation, and extraction, suitable for both novice users and advanced taxonomy designers.