Skip to content

Batch Processing

Process multiple hyperspectral datasets efficiently with consistent settings.

Overview

Batch processing allows you to analyze multiple hyperspectral datasets with the same analysis pipeline. The batch command:

  • Processes datasets sequentially (one after another)
  • Applies consistent settings across all datasets
  • Handles errors gracefully - continues if one dataset fails
  • Generates organized output with clear naming
  • Provides progress feedback and summary statistics

When to Use Batch Processing

Good Use Cases

Processing experimental datasets

# Process all treatment groups
hyperseed batch experiments/treatment_A/ --config settings.yaml
hyperseed batch experiments/treatment_B/ --config settings.yaml

Consistent analysis across time points

# Process all time points with same settings
hyperseed batch timeseries/ --pattern "day_*" --output-dir results/

Quality control across samples

# Quick batch analysis of all samples
hyperseed batch samples/ --config minimal_config.yaml

Re-analyzing with different parameters

# First pass
hyperseed batch datasets/ --config pass1.yaml --output-dir results_v1/

# Second pass with adjusted settings
hyperseed batch datasets/ --config pass2.yaml --output-dir results_v2/

Single dataset - Use analyze command instead

# Don't use batch for one dataset
hyperseed analyze dataset/sample_001 --output results.csv

Different settings per dataset - Process individually

# Process each with different config
hyperseed analyze dataset/sample_001 --config config_A.yaml
hyperseed analyze dataset/sample_002 --config config_B.yaml

Interactive parameter tuning - Use segment or analyze

# Use analyze for testing parameters
hyperseed analyze dataset/sample --min-pixels 100 --export-plots
hyperseed analyze dataset/sample --min-pixels 200 --export-plots

Quick Start

Basic Batch Processing

# Process all datasets in directory
hyperseed batch datasets/

What happens: 1. Finds all subdirectories in datasets/ 2. Processes each sequentially 3. Saves results to datasets/results/

With Custom Output

hyperseed batch datasets/ --output-dir analysis_results/

With Configuration

hyperseed batch datasets/ \
    --config batch_config.yaml \
    --output-dir results/

Directory Structure

Input Structure Required

datasets/
├── sample_001/
│   └── capture/
│       ├── data.raw
│       ├── data.hdr
│       ├── WHITEREF_data.raw
│       ├── WHITEREF_data.hdr
│       ├── DARKREF_data.raw
│       └── DARKREF_data.hdr
├── sample_002/
│   └── capture/
│       └── ...
└── sample_003/
    └── capture/
        └── ...

Requirements: - Each dataset must be in its own subdirectory - Each must have a capture/ folder - Must contain data.hdr, white reference, dark reference

Output Structure Generated

results/
├── sample_001_spectra.csv
├── sample_001_distribution.png
├── sample_001_segmentation.png
├── sample_001_spectra.png
├── sample_002_spectra.csv
├── sample_002_distribution.png
├── sample_002_segmentation.png
├── sample_002_spectra.png
└── ...

Configuration for Batch Processing

Creating a Batch Configuration

# Generate template
hyperseed config --output batch_config.yaml --preset minimal
# batch_config.yaml - Optimized for batch processing

calibration:
  apply_calibration: true
  clip_negative: true
  clip_max: 1.0
  interpolate_bad_pixels: true

preprocessing:
  method: minimal  # Fast, good for segmentation

segmentation:
  algorithm: watershed  # Best balance
  min_pixels: 200
  reject_overlapping: true
  remove_outliers: true  # Automatic quality control
  outlier_min_area: 50
  outlier_max_area: 2000

Fast Batch Configuration

For maximum speed when processing many datasets:

# fast_batch.yaml - Speed-optimized

preprocessing:
  method: none  # Skip preprocessing

segmentation:
  algorithm: threshold  # Fastest algorithm
  min_pixels: 200
  morphology_operations: false
  remove_outliers: false
hyperseed batch large_dataset/ --config fast_batch.yaml

Pattern Matching

Use glob patterns to selectively process datasets.

Match All (Default)

hyperseed batch datasets/
# Processes: sample_001, sample_002, sample_003, ...

Match Prefix

# Process only datasets starting with "SWIR_"
hyperseed batch datasets/ --pattern "SWIR_*"
# Processes: SWIR_001, SWIR_002, SWIR_003
# Skips: VIS_001, other_data

Match Specific Range

# Process samples 1-5
hyperseed batch datasets/ --pattern "sample_00[1-5]"
# Processes: sample_001, sample_002, ..., sample_005
# Skips: sample_006, sample_007, ...

Match Multiple Patterns

# Process treatment A samples
hyperseed batch experiments/ --pattern "A_*"

# Then process treatment B samples
hyperseed batch experiments/ --pattern "B_*" --output-dir results_B/

Complex Patterns

# Process all SWIR samples from experiment 1
hyperseed batch data/ --pattern "exp1_SWIR_*"

# Process time point zero across all experiments
hyperseed batch data/ --pattern "*_t00_*"

Error Handling

Batch processing continues even when individual datasets fail.

Example Output with Failures

$ hyperseed batch datasets/ --output-dir results/

[1/5] Processing sample_001...
   Processed: 47 seeds  sample_001_spectra.csv
     Generated visualizations:
       - sample_001_distribution.png (spatial & size)
       - sample_001_segmentation.png (numbered seeds)
       - sample_001_spectra.png (spectral data)

[2/5] Processing sample_002...
   Failed: ENVI header file not found

[3/5] Processing sample_003...
   No seeds found in sample_003 (check min-pixels threshold)

[4/5] Processing sample_004...
   Processed: 52 seeds  sample_004_spectra.csv

[5/5] Processing sample_005...
   Processed: 39 seeds  sample_005_spectra.csv

Batch Processing Summary:
  Successful: 3/5
  Failed: sample_002, sample_003

Common Failure Reasons

Missing files:

Error: ENVI header file not found
Solution: Check dataset structure, ensure capture/data.hdr exists

Corrupted data:

Error: Unable to read ENVI data
Solution: Verify data files are not corrupted, check file permissions

No seeds detected:

⚠ No seeds found (check min-pixels threshold)
Solution: Lower --min-pixels threshold or check image quality

Debugging Failed Datasets

# Test failed dataset individually
hyperseed analyze datasets/sample_002 --output test.csv -v

# Run batch with debug mode
hyperseed batch datasets/ --debug --output-dir results/

Performance and Timing

Typical Processing Times

Per dataset (typical seed image): - Calibration: ~5-10 seconds - Preprocessing: ~2-5 seconds - Segmentation: ~5-10 seconds - Extraction: ~3-5 seconds - Plotting: ~5-10 seconds - Total: ~30-60 seconds per dataset

Batch processing time:

Total time ≈ Number of datasets × Time per dataset

Examples:
- 10 datasets × 45 sec = ~7.5 minutes
- 50 datasets × 45 sec = ~38 minutes
- 100 datasets × 45 sec = ~75 minutes

Speed Optimization

1. Use minimal preprocessing

preprocessing:
  method: minimal  # 2-3x faster than advanced

2. Use faster segmentation

segmentation:
  algorithm: threshold  # Faster than watershed

3. Disable outlier removal

hyperseed batch datasets/ --no-outlier-removal

Combined fast configuration:

preprocessing:
  method: none

segmentation:
  algorithm: threshold
  morphology_operations: false
  remove_outliers: false

Speed improvement: ~2-3x faster (20-30 seconds per dataset)

Workflow Examples

Example 1: Research Experiment

Process multiple treatment groups with consistent settings.

# Create configuration
cat > experiment_config.yaml << EOF
preprocessing:
  method: minimal

segmentation:
  algorithm: watershed
  min_pixels: 200
  remove_outliers: true
EOF

# Process each treatment group
for group in control treatment_A treatment_B; do
    hyperseed batch experiments/$group/ \
        --config experiment_config.yaml \
        --output-dir results/$group/
done

# Compare results
ls results/*/sample_*.csv

Example 2: Time Series Analysis

Process all time points with same settings.

# Process all time points
hyperseed batch timeseries/ \
    --pattern "day_*" \
    --config timeseries_config.yaml \
    --output-dir timeseries_results/

# Results organized by day
ls timeseries_results/day_*_spectra.csv

Example 3: Quality Control

Quick batch processing to identify problem datasets.

# Fast processing with minimal settings
hyperseed batch samples/ \
    --config minimal_config.yaml \
    --output-dir qc_results/ \
    -v

# Review which samples failed
grep "Failed" qc_results/*.log

# Check seed counts
for f in qc_results/*_spectra.csv; do
    echo "$f: $(wc -l < $f) seeds"
done

Example 4: Re-analysis with Different Parameters

Compare results with different min_pixels thresholds.

# First pass - default
hyperseed batch datasets/ \
    --min-pixels 200 \
    --output-dir results_p200/

# Second pass - lower threshold
hyperseed batch datasets/ \
    --min-pixels 100 \
    --output-dir results_p100/

# Compare seed counts
diff <(ls results_p200/*.csv | wc -l) \
     <(ls results_p100/*.csv | wc -l)

Example 5: Selective Re-processing

Re-process only failed datasets.

# Initial batch run
hyperseed batch datasets/ --output-dir results/

# Identify successful datasets
successful=$(ls results/*_spectra.csv | xargs -n1 basename | sed 's/_spectra.csv//')

# Find failed datasets
cd datasets/
for dataset in *; do
    if ! echo "$successful" | grep -q "$dataset"; then
        echo "Failed: $dataset"
    fi
done

# Re-process failed datasets manually
hyperseed analyze datasets/failed_sample_002 --output results/failed_sample_002_spectra.csv

Monitoring Progress

Real-time Monitoring

# Terminal 1: Run batch processing
hyperseed batch datasets/ --output-dir results/ -v

# Terminal 2: Monitor output files
watch -n 5 'ls -lh results/*.csv | wc -l'

# Terminal 3: Monitor disk usage
watch -n 10 'du -sh results/'

Progress Estimation

# Count total datasets
total=$(ls -d datasets/*/ | wc -l)

# Monitor completion
while true; do
    completed=$(ls results/*_spectra.csv 2>/dev/null | wc -l)
    echo "Progress: $completed / $total"
    sleep 10
done

Batch vs. Individual Analysis

Aspect batch command analyze command
Number of datasets Multiple Single
Processing Sequential One-time
Settings Consistent across all Per-run
Error handling Continues on failure Stops on error
Progress display Dataset count (1/N) Detailed progress bar
Output organization All in one directory Specified per run
Interactive tuning Not suitable Good for testing
Automation Excellent Requires scripting

Use batch when: - Processing 3+ datasets with same settings - Need consistent analysis pipeline - Running automated workflows - Don't need to test parameters

Use analyze when: - Processing single dataset - Testing different parameters - Need detailed progress feedback - Want to inspect results interactively

Troubleshooting

Issue: No datasets found

Error: No datasets found matching '*'

Causes: - Wrong directory structure - No capture/ folders - Pattern doesn't match

Solutions:

# Check structure
ls -la datasets/

# Verify capture folders
find datasets/ -type d -name "capture"

# Try explicit pattern
hyperseed batch datasets/ --pattern "*" -v

Issue: All datasets failing

Solutions:

# Test one dataset individually
hyperseed analyze datasets/sample_001 --output test.csv -v

# Check data files
ls datasets/sample_001/capture/

# Run with debug
hyperseed batch datasets/ --debug

Issue: Inconsistent seed counts

Possible causes: - Variable image quality - min_pixels threshold not appropriate - Outlier removal too aggressive

Solutions:

# Disable outlier removal to see raw counts
hyperseed batch datasets/ --no-outlier-removal

# Lower min_pixels
hyperseed batch datasets/ --min-pixels 100

# Use custom config with looser thresholds

Issue: Memory errors

Solutions:

# Process fewer datasets at once
hyperseed batch datasets/ --pattern "sample_00[1-3]"
hyperseed batch datasets/ --pattern "sample_00[4-6]"

# Close other applications
# Check available memory: free -h (Linux) or top (macOS)

Issue: Slow processing

Solutions:

# Use fast configuration
hyperseed batch datasets/ --config fast_config.yaml

# Skip plots
hyperseed batch datasets/ --config no_plots.yaml

# Process in chunks

See Also