Extraction CLI Commands
This guide covers the command-line interface for creating and managing extraction jobs in Compileo.
Overview
The extraction CLI provides complete parity with the extraction API, allowing users to create extraction jobs through command-line interface with identical functionality and parameters.
Commands
extraction create
Creates a new extraction job using a taxonomy to extract content from processed documents.
Syntax
compileo extraction create --project-id <project_id> --taxonomy-id <taxonomy_id> --selected-categories "category1,category2" --effective-classifier <model> [options]
Required Parameters
--project-id(integer): The ID of the project containing the data--taxonomy-id(integer): The ID of the taxonomy to use for extraction--selected-categories(string): Comma-separated list of category names to extract--initial-classifier(string): The AI model for the initial classification stage (e.g.,grok,gemini,ollama,openai).--extraction-type(string, default: "ner"): Type of extraction:'ner'for named entities,'whole_text'for complete text portions.
Optional Parameters
--extraction-mode(string, default: "contextual"): Extraction mode:'contextual'or'document-wide'.--enable-validation-stage: Enable validation stage for improved accuracy--validation-classifier(string): Separate classifier for validation phase--confidence-threshold(float, default: 0.5): Minimum confidence score (0.0-1.0)--max-chunks(integer): Maximum number of chunks to process
Available AI Models
- Gemini:
gemini/gemini-1.5-flash,gemini/gemini-1.5-pro - Ollama:
ollama/llama3.1,ollama/llama3.2,ollama/mistral - Grok:
grok/grok-1,grok/grok-beta - OpenAI:
openai/gpt-4o,openai/gpt-4-turbo,openai/gpt-3.5-turbo
Examples
Basic extraction with Gemini:
compileo extraction create \
--project-id 1 \
--taxonomy-id 5 \
--selected-categories "diagnosis,treatment" \
--effective-classifier "gemini/gemini-1.5-flash"
Advanced extraction with validation:
compileo extraction create \
--project-id 1 \
--taxonomy-id 5 \
--selected-categories "symptoms,causes,effects" \
--effective-classifier "ollama/llama3.1" \
--enable-validation-stage \
--validation-classifier "gemini/gemini-1.5-pro" \
--confidence-threshold 0.7 \
--max-chunks 1000
Limited processing for testing:
compileo extraction create \
--project-id 1 \
--taxonomy-id 5 \
--selected-categories "all" \
--effective-classifier "gemini/gemini-1.5-flash" \
--max-chunks 100
Document-wide extraction for maximum coverage:
compileo extraction create \
--project-id 1 \
--taxonomy-id 5 \
--selected-categories "diagnosis,treatment" \
--effective-classifier "ollama/llama3.1" \
--extraction-mode document-wide \
--confidence-threshold 0.3
Output
Success Response:
๐ Creating extraction job for project 1
๐ Taxonomy ID: 5
๐ท๏ธ Categories: diagnosis, treatment
๐ค Classifier: gemini/gemini-1.5-flash
โ
Extraction job created successfully!
๐ Job ID: 12345
๐ Monitor progress: compileo jobs poll 12345
๐ Check status: compileo jobs status 12345
Job Monitoring
After creating an extraction job, you can monitor its progress using the job management commands:
# Check job status
compileo jobs status 12345
# Poll for completion (long-running)
compileo jobs poll 12345
# Stream real-time updates
compileo jobs stream 12345
Error Handling
The CLI provides clear error messages for common issues:
- Invalid project ID: "Project with ID X not found"
- Invalid taxonomy ID: "Taxonomy with ID Y not found"
- Invalid categories: "Category 'invalid' not found in taxonomy"
- Model unavailable: "AI model 'invalid/model' is not available"
- Connection errors: "Failed to connect to API server"
Integration with Other Commands
Complete Workflow Example
# 1. Create project
compileo projects create --name "Medical Study" --description "Clinical trial data"
# 2. Upload documents
compileo documents upload --project-id 1 --files "study1.pdf,study2.pdf"
# 3. Parse documents
compileo documents parse --project-id 1 --document-ids "1,2"
# 4. Chunk documents
compileo documents chunk --project-id 1 --document-ids "1,2" --chunker "gemini"
# 5. Generate taxonomy
compileo taxonomy generate --project-id 1 --name "Medical Conditions" --documents "1,2"
# 6. Create extraction job
compileo extraction create \
--project-id 1 \
--taxonomy-id 1 \
--selected-categories "diagnosis,treatment,outcome" \
--effective-classifier "gemini/gemini-1.5-flash"
# 7. Monitor and retrieve results
compileo jobs status <job_id>
compileo extraction results <job_id>
Configuration
The CLI uses the same configuration as the API and GUI:
- API Keys: Retrieved from GUI settings database
- Model Selection: Based on available AI providers
- Validation: Same parameter validation as API endpoints
Troubleshooting
Common Issues
"Model not available" - Check that the AI model is properly configured in settings - Verify API keys are set for the selected provider - Try a different model from the same provider
"Taxonomy not found"
- Verify the taxonomy ID exists for the project
- Check that taxonomy generation completed successfully
- Use compileo taxonomy list --project-id X to see available taxonomies
"No chunks found" - Ensure documents have been parsed and chunked - Check that chunking completed successfully - Verify project contains processed documents
Getting Help
# Show command help
compileo extraction create --help
# List all extraction commands
compileo extraction --help
# General CLI help
compileo --help
API Parity
The CLI provides complete feature parity with the /api/v1/extraction/ endpoint:
- Same parameters: All API parameters are supported
- Same validation: Identical input validation rules
- Same processing: Uses the same backend extraction pipeline
- Same results: Produces identical extraction results
This ensures consistent behavior whether using the API, GUI, or CLI interfaces.