batch¶
Process multiple hyperspectral datasets sequentially.
Synopsis¶
Description¶
The batch command processes multiple datasets sequentially (one after another) with consistent settings. It applies the same analysis pipeline to each dataset and saves results to a structured output directory.
Arguments¶
INPUT_DIR¶
Directory containing multiple dataset subdirectories.
Required: Yes
Format: Each subdirectory should contain:
- capture/data.raw and capture/data.hdr (main data)
- capture/WHITEREF_data.raw and .hdr (white reference)
- capture/DARKREF_data.raw and .hdr (dark reference)
Example:
datasets/
├── sample_001/
│ └── capture/
│ ├── data.raw, data.hdr
│ ├── WHITEREF_data.raw, WHITEREF_data.hdr
│ └── DARKREF_data.raw, DARKREF_data.hdr
├── sample_002/
│ └── capture/
│ └── ...
└── sample_003/
└── capture/
└── ...
Options¶
-o, --output-dir PATH¶
Output directory for results.
Type: Path
Default: INPUT_DIR/results
All output files are saved to this directory with dataset names as prefixes.
Example:
-c, --config PATH¶
Path to YAML configuration file.
Type: Path Default: None (uses default settings)
Applies consistent preprocessing, segmentation, and output settings across all datasets.
Example:
--pattern TEXT¶
Pattern to match dataset directories (glob-style).
Type: Text
Default: * (matches all subdirectories)
Use glob patterns to filter which datasets to process.
Example:
# Process only datasets starting with "sample_"
hyperseed batch datasets/ --pattern "sample_*"
# Process only SWIR datasets
hyperseed batch datasets/ --pattern "SWIR_*"
# Process specific range
hyperseed batch datasets/ --pattern "sample_00[1-5]"
--min-pixels INTEGER¶
Minimum seed size in pixels.
Type: Integer Default: 200 Range: 10-10000
Overrides the min_pixels setting from configuration.
Example:
--no-outlier-removal¶
Disable automatic outlier removal.
Type: Flag (boolean) Default: False (outlier removal enabled)
Disables outlier detection and removal for all datasets.
Example:
Complete Examples¶
Basic Batch Processing¶
What it does:
1. Finds all subdirectories in datasets/
2. Processes each sequentially
3. Saves results to datasets/results/
Custom Output Directory¶
Output location: analysis_results/
Filter by Pattern¶
# Process only datasets starting with "sample_"
hyperseed batch datasets/ --pattern "sample_*"
# Process only specific samples
hyperseed batch datasets/ --pattern "sample_00[1-5]"
With Configuration File¶
batch_config.yaml:
preprocessing:
method: minimal # Fast processing for batch
segmentation:
algorithm: watershed
min_pixels: 200
remove_outliers: true
output:
format: csv
include_plots: true
Override Settings¶
# Use config but override min_pixels
hyperseed batch datasets/ \
--config batch_config.yaml \
--min-pixels 150 \
--output-dir results/
Output Structure¶
For input directory datasets/ containing sample_001/, sample_002/, etc., the batch command generates:
results/
├── sample_001_spectra.csv
├── sample_001_distribution.png
├── sample_001_segmentation.png
├── sample_001_spectra.png
├── sample_002_spectra.csv
├── sample_002_distribution.png
├── sample_002_segmentation.png
├── sample_002_spectra.png
├── sample_003_spectra.csv
└── ...
Generated Files Per Dataset¶
For each dataset that contains seeds:
- {name}_spectra.csv - Extracted spectral data with metadata
- Seed IDs, coordinates, areas, morphology
-
Complete spectral signatures (all wavelengths)
-
{name}_distribution.png - Spatial and size distribution
- Left panel: Spatial distribution of seeds
-
Right panel: Area distribution histogram
-
{name}_segmentation.png - Seed visualization
- Left panel: Original image
- Middle panel: Numbered seeds with colors
-
Right panel: Seed boundaries overlay
-
{name}_spectra.png - Spectral curves
- Individual seed spectra (light lines)
- Mean spectrum (bold line)
- Standard deviation band (shaded)
For datasets with no seeds: - No files are generated - Warning is displayed in console
Processing Pipeline¶
For each dataset, the batch command performs:
- Dataset Discovery
- Searches INPUT_DIR for subdirectories matching pattern
-
Filters to directories with
capture/folder -
Sequential Processing (for each dataset)
- Load and calibrate hyperspectral data
- Apply preprocessing (from config or defaults)
- Segment seeds
- Extract spectra
- Apply outlier removal (if enabled)
-
Save CSV and generate plots
-
Error Handling
- If a dataset fails, error is logged
- Processing continues with next dataset
-
Summary shows success/failure counts
-
Summary Display
- Total datasets processed
- Successful count
- Failed datasets (if any)
Dataset Discovery¶
The batch command automatically finds datasets with this structure:
# Searches for directories matching pattern
INPUT_DIR/{pattern}/capture/data.hdr
# Examples found:
datasets/sample_001/capture/data.hdr ✓
datasets/sample_002/capture/data.hdr ✓
datasets/other_file.txt ✗ (not a directory)
datasets/no_capture/data.hdr ✗ (missing capture/ folder)
Error Handling¶
Batch processing continues even if individual datasets fail:
$ hyperseed batch datasets/ --output-dir results/
[1/5] Processing sample_001...
✓ Processed: 47 seeds → sample_001_spectra.csv
[2/5] Processing sample_002...
✗ Failed: ENVI header not found
[3/5] Processing sample_003...
✓ Processed: 52 seeds → sample_003_spectra.csv
Batch Processing Summary:
Successful: 3/5
Failed: sample_002, sample_004
Failed datasets: - Error message is displayed - Processing continues with next dataset - Failed datasets listed in summary
Common failure reasons: - Missing data files - Corrupted ENVI headers - No seeds detected (if validation too strict) - Insufficient disk space
Performance Notes¶
Processing Time¶
Batch processing is sequential (one dataset at a time):
- Time per dataset: ~30-60 seconds (typical)
- Total time: num_datasets × time_per_dataset
Example:
Reducing Processing Time¶
# fast_config.yaml - Optimized for speed
preprocessing:
method: minimal # Minimal preprocessing (fastest)
segmentation:
algorithm: threshold # Faster than watershed
min_pixels: 200
morphology_operations: false # Skip cleanup
remove_outliers: false # Skip outlier detection
Memory Usage¶
- Per dataset: ~1-2GB RAM
- Total: Same as single dataset (sequential processing)
Troubleshooting¶
Issue: No datasets found¶
Error: No datasets found matching '*'
Solutions:
# Check directory structure
ls -la datasets/
# Verify capture folders exist
find datasets/ -name "capture" -type d
# Check pattern
hyperseed batch datasets/ --pattern "*" -v
Issue: All datasets failing¶
Possible causes: - Incorrect directory structure - Missing reference files - Corrupted data
Solutions:
# Test single dataset first
hyperseed analyze datasets/sample_001 --output test.csv
# Enable debug mode
hyperseed batch datasets/ --debug
Issue: Some seeds missing¶
Possible causes: - min_pixels threshold too high - Outlier removal too aggressive
Solutions:
# Lower min_pixels
hyperseed batch datasets/ --min-pixels 100
# Disable outlier removal
hyperseed batch datasets/ --no-outlier-removal
# Use custom config with looser thresholds
Issue: Processing too slow¶
Solutions:
# Use minimal preprocessing
hyperseed batch datasets/ --config fast_config.yaml
# Reduce plot generation (custom config)
Comparison with analyze Command¶
| Feature | batch | analyze |
|---|---|---|
| Datasets | Multiple | Single |
| Processing | Sequential | Single run |
| Output | Multiple files | Single file set |
| Error handling | Continues on failure | Stops on error |
| Progress | Shows count (1/N) | Shows progress bar |
| Interactive | No | Optional (--export-plots) |
| Use case | Process many datasets | Detailed single analysis |
When to use batch: - Processing multiple datasets with same settings - Automated workflows - Consistent analysis across samples
When to use analyze: - Single dataset analysis - Testing different parameters - Need detailed progress information
Advanced Usage¶
Batch with Different Settings Per Type¶
Process different dataset types with different configs:
# Process SWIR datasets with one config
hyperseed batch datasets/ \
--pattern "SWIR_*" \
--config swir_config.yaml \
--output-dir results_swir/
# Process VIS datasets with another config
hyperseed batch datasets/ \
--pattern "VIS_*" \
--config vis_config.yaml \
--output-dir results_vis/
Progress Monitoring¶
# Run with verbose output
hyperseed batch datasets/ -v --output-dir results/
# Monitor output directory
watch -n 5 'ls -lh results/*.csv | wc -l'
Resume Failed Processing¶
# First run - some fail
hyperseed batch datasets/ --output-dir results/
# Find what succeeded
ls results/*.csv
# Process only missing datasets
hyperseed batch datasets/ \
--pattern "sample_00[6-9]" \
--output-dir results/
See Also¶
- analyze command: Process single dataset
- Configuration Guide: Create batch configurations
- Batch Processing Guide: Detailed workflows and examples