Dataset Quality Analysis in Compileo GUI
Overview
The Compileo GUI provides comprehensive dataset quality analysis capabilities through an intuitive web interface. The quality analysis evaluates datasets across multiple dimensions including diversity, bias detection, difficulty assessment, and consistency validation.
Accessing Quality Analysis
- Navigate to the Application: Open Compileo in your web browser
- Select Quality Analysis: Click on "๐ Quality Metrics" in the sidebar or main navigation
- Choose Analysis Type: Select from Analysis, Metrics Dashboard, or History tabs
Interface Components
Analysis Tab
The main analysis interface for running quality assessments on datasets.
Dataset Selection
- Available Datasets: Dropdown showing all JSONL dataset files in the test_outputs directory
- Dataset Information: Displays dataset name and number of entries
- File Format: Supports JSONL format with question/answer structure
Analysis Configuration
Basic Settings: - Quality Metrics Selection: - Diversity Analysis: Evaluates lexical and semantic variety - Bias Detection: Identifies potential demographic and content biases - Difficulty Assessment: Measures question complexity and readability - Overall Quality Threshold: Minimum passing score (0.0-1.0, default 0.7) - Output Format: JSON, HTML, or PDF report formats
Advanced Settings: - Diversity Threshold: Minimum diversity score required - Bias Threshold: Maximum acceptable bias score - Target Difficulty: Desired difficulty level for the dataset
Analysis Execution
- Configure Settings: Select metrics and adjust thresholds
- Start Analysis: Click "๐ Start Quality Analysis"
- Monitor Progress: View real-time status updates
- Review Results: Access completed analysis in dashboard
Metrics Dashboard Tab
Interactive visualization and detailed breakdown of quality analysis results.
Overall Summary Metrics
- Overall Quality Score: Weighted average across all enabled metrics
- Dataset Size: Total number of entries analyzed
- Passed/Failed Metrics: Count of metrics meeting thresholds
- Failed Metrics Count: Number of metrics below threshold
Individual Metrics Visualization
- Metrics Bar Chart: Color-coded bars showing scores for each metric
- Green bars: Passed metrics
- Red bars: Failed metrics
- Orange threshold line: Default quality threshold
- Detailed Results Table: Tabular view with scores, thresholds, and pass/fail status
Metric-Specific Breakdowns
Diversity Analysis: - Lexical Diversity: Vocabulary richness and variety - Semantic Diversity: Concept and meaning coverage - Topic Balance: Subject matter distribution - Radar Chart: Visual representation of diversity profile
Bias Detection: - Overall Bias Score: Composite bias measurement - Bias Indicators: Breakdown by demographic categories - Content Balance: Topic and perspective distribution
Difficulty Assessment: - Average Difficulty: Mean complexity score - Complexity Score: Cognitive load assessment - Readability Score: Text comprehension metrics - Distribution Chart: Pie chart showing difficulty level distribution
History Tab
Review past quality analysis runs and track performance over time.
Analysis History Table
- Job ID: Unique identifier for each analysis run
- Status: Completion status (completed, failed, running)
- Summary Scores: Overall quality metrics
- Completion Timestamp: When analysis finished
Summary Statistics
- Total Analyses: Number of quality assessments run
- Completed Analyses: Successfully finished evaluations
- Failed Analyses: Analyses that encountered errors
Quality Metrics Explained
Diversity Metrics
Evaluates content variety and representation: - Lexical Diversity: Measures vocabulary richness and word variety - Semantic Diversity: Assesses concept coverage and meaning variety - Topic Coverage: Analyzes subject matter distribution and balance
Bias Detection
Identifies potential biases in dataset content: - Demographic Bias: Gender, ethnicity, age, and cultural representation - Content Bias: Topic selection and perspective balance - Language Bias: Formal/informal tone and register distribution
Difficulty Assessment
Measures question and content complexity: - Reading Level: Text complexity using readability formulas - Cognitive Load: Reasoning and comprehension requirements - Domain Expertise: Required knowledge level for correct answers
Consistency Validation
Ensures logical and factual coherence: - Factual Consistency: Accuracy of information and claims - Logical Consistency: Reasoning validity and coherence - Format Consistency: Structural uniformity across entries
Analysis Workflow
Preparing for Analysis
- Generate Dataset: Create or upload dataset in JSONL format
- Review Content: Ensure proper question/answer structure
- Check Size: Verify sufficient entries for reliable analysis (minimum 10-50 recommended)
Running Quality Analysis
- Select Dataset: Choose from available JSONL files
- Configure Metrics: Enable desired quality checks
- Set Thresholds: Adjust passing criteria as needed
- Execute Analysis: Start background processing
- Monitor Progress: Track real-time status updates
Interpreting Results
- Review Overall Score: Check if dataset meets quality requirements
- Analyze Failed Metrics: Identify specific quality issues
- Examine Details: Review metric-specific breakdowns
- Address Issues: Modify dataset based on recommendations
Error Handling
Common Issues
- Dataset Not Found: Ensure JSONL files exist in test_outputs directory
- Invalid Format: Verify question/answer structure in dataset
- Analysis Timeout: Large datasets may take time to process
- Metric Failures: Individual metrics may fail while others succeed
Recovery Actions
- Restart Analysis: Failed analyses can be restarted
- Modify Configuration: Adjust thresholds or disable problematic metrics
- Check Dataset: Validate data format and content
- Contact Support: For persistent technical issues
Best Practices
Dataset Preparation
- Use consistent question/answer formats
- Include diverse, representative content
- Add metadata for enhanced analysis
- Validate data quality before analysis
Analysis Configuration
- Enable all relevant metrics for comprehensive evaluation
- Set appropriate thresholds based on use case
- Consider domain-specific quality requirements
- Use advanced settings for fine-tuned analysis
Result Utilization
- Address failed metrics before dataset deployment
- Use detailed breakdowns for targeted improvements
- Track quality trends across dataset versions
- Compare quality scores between datasets
Performance Optimization
- Analyze datasets during off-peak hours for large files
- Use appropriate metric subsets for quick assessments
- Cache results for repeated analysis of same datasets
- Monitor system resources during analysis
Integration with Other Features
Dataset Generation Workflow
- Generate Dataset: Create training data using dataset generation tools
- Run Quality Analysis: Evaluate generated content quality
- Review Results: Identify areas for improvement
- Iterate Generation: Refine prompts and parameters based on analysis
- Validate Improvements: Re-run analysis to confirm quality gains
Benchmarking Integration
- Complete Quality Analysis: Ensure dataset meets quality standards
- Run Benchmarking: Evaluate model performance on quality-assessed data
- Correlate Results: Compare quality metrics with benchmark performance
- Optimize Dataset: Use insights to improve both quality and performance
Troubleshooting
Analysis Not Starting
- Verify dataset file exists and is readable
- Check JSONL format and content structure
- Ensure sufficient system resources available
Unexpected Results
- Review dataset content for consistency issues
- Check metric thresholds are appropriate for content type
- Validate analysis configuration settings
Performance Issues
- Reduce dataset size for faster analysis
- Disable unnecessary metrics for quick checks
- Run analysis during low-usage periods
This quality analysis interface provides comprehensive evaluation capabilities to ensure datasets meet the highest standards for AI training and evaluation.