Skip to content

batch

Process multiple hyperspectral datasets sequentially.

Synopsis

hyperseed batch INPUT_DIR [OPTIONS]

Description

The batch command processes multiple datasets sequentially (one after another) with consistent settings. It applies the same analysis pipeline to each dataset and saves results to a structured output directory.

Arguments

INPUT_DIR

Directory containing multiple dataset subdirectories.

Required: Yes

Format: Each subdirectory should contain: - capture/data.raw and capture/data.hdr (main data) - capture/WHITEREF_data.raw and .hdr (white reference) - capture/DARKREF_data.raw and .hdr (dark reference)

Example:

datasets/
├── sample_001/
│   └── capture/
│       ├── data.raw, data.hdr
│       ├── WHITEREF_data.raw, WHITEREF_data.hdr
│       └── DARKREF_data.raw, DARKREF_data.hdr
├── sample_002/
│   └── capture/
│       └── ...
└── sample_003/
    └── capture/
        └── ...

Options

-o, --output-dir PATH

Output directory for results.

Type: Path Default: INPUT_DIR/results

All output files are saved to this directory with dataset names as prefixes.

Example:

hyperseed batch datasets/ --output-dir analysis_results/

-c, --config PATH

Path to YAML configuration file.

Type: Path Default: None (uses default settings)

Applies consistent preprocessing, segmentation, and output settings across all datasets.

Example:

hyperseed batch datasets/ --config batch_config.yaml

--pattern TEXT

Pattern to match dataset directories (glob-style).

Type: Text Default: * (matches all subdirectories)

Use glob patterns to filter which datasets to process.

Example:

# Process only datasets starting with "sample_"
hyperseed batch datasets/ --pattern "sample_*"

# Process only SWIR datasets
hyperseed batch datasets/ --pattern "SWIR_*"

# Process specific range
hyperseed batch datasets/ --pattern "sample_00[1-5]"

--min-pixels INTEGER

Minimum seed size in pixels.

Type: Integer Default: 200 Range: 10-10000

Overrides the min_pixels setting from configuration.

Example:

hyperseed batch datasets/ --min-pixels 150

--no-outlier-removal

Disable automatic outlier removal.

Type: Flag (boolean) Default: False (outlier removal enabled)

Disables outlier detection and removal for all datasets.

Example:

hyperseed batch datasets/ --no-outlier-removal

Complete Examples

Basic Batch Processing

hyperseed batch datasets/

What it does: 1. Finds all subdirectories in datasets/ 2. Processes each sequentially 3. Saves results to datasets/results/

Custom Output Directory

hyperseed batch datasets/ --output-dir analysis_results/

Output location: analysis_results/

Filter by Pattern

# Process only datasets starting with "sample_"
hyperseed batch datasets/ --pattern "sample_*"

# Process only specific samples
hyperseed batch datasets/ --pattern "sample_00[1-5]"

With Configuration File

hyperseed batch datasets/ \
    --config batch_config.yaml \
    --output-dir results/

batch_config.yaml:

preprocessing:
  method: minimal  # Fast processing for batch

segmentation:
  algorithm: watershed
  min_pixels: 200
  remove_outliers: true

output:
  format: csv
  include_plots: true

Override Settings

# Use config but override min_pixels
hyperseed batch datasets/ \
    --config batch_config.yaml \
    --min-pixels 150 \
    --output-dir results/

Output Structure

For input directory datasets/ containing sample_001/, sample_002/, etc., the batch command generates:

results/
├── sample_001_spectra.csv
├── sample_001_distribution.png
├── sample_001_segmentation.png
├── sample_001_spectra.png
├── sample_002_spectra.csv
├── sample_002_distribution.png
├── sample_002_segmentation.png
├── sample_002_spectra.png
├── sample_003_spectra.csv
└── ...

Generated Files Per Dataset

For each dataset that contains seeds:

  1. {name}_spectra.csv - Extracted spectral data with metadata
  2. Seed IDs, coordinates, areas, morphology
  3. Complete spectral signatures (all wavelengths)

  4. {name}_distribution.png - Spatial and size distribution

  5. Left panel: Spatial distribution of seeds
  6. Right panel: Area distribution histogram

  7. {name}_segmentation.png - Seed visualization

  8. Left panel: Original image
  9. Middle panel: Numbered seeds with colors
  10. Right panel: Seed boundaries overlay

  11. {name}_spectra.png - Spectral curves

  12. Individual seed spectra (light lines)
  13. Mean spectrum (bold line)
  14. Standard deviation band (shaded)

For datasets with no seeds: - No files are generated - Warning is displayed in console

Processing Pipeline

For each dataset, the batch command performs:

  1. Dataset Discovery
  2. Searches INPUT_DIR for subdirectories matching pattern
  3. Filters to directories with capture/ folder

  4. Sequential Processing (for each dataset)

  5. Load and calibrate hyperspectral data
  6. Apply preprocessing (from config or defaults)
  7. Segment seeds
  8. Extract spectra
  9. Apply outlier removal (if enabled)
  10. Save CSV and generate plots

  11. Error Handling

  12. If a dataset fails, error is logged
  13. Processing continues with next dataset
  14. Summary shows success/failure counts

  15. Summary Display

  16. Total datasets processed
  17. Successful count
  18. Failed datasets (if any)

Dataset Discovery

The batch command automatically finds datasets with this structure:

# Searches for directories matching pattern
INPUT_DIR/{pattern}/capture/data.hdr

# Examples found:
datasets/sample_001/capture/data.hdr  datasets/sample_002/capture/data.hdr  datasets/other_file.txt                (not a directory)
datasets/no_capture/data.hdr           (missing capture/ folder)

Error Handling

Batch processing continues even if individual datasets fail:

$ hyperseed batch datasets/ --output-dir results/

[1/5] Processing sample_001...
   Processed: 47 seeds  sample_001_spectra.csv

[2/5] Processing sample_002...
   Failed: ENVI header not found

[3/5] Processing sample_003...
   Processed: 52 seeds  sample_003_spectra.csv

Batch Processing Summary:
  Successful: 3/5
  Failed: sample_002, sample_004

Failed datasets: - Error message is displayed - Processing continues with next dataset - Failed datasets listed in summary

Common failure reasons: - Missing data files - Corrupted ENVI headers - No seeds detected (if validation too strict) - Insufficient disk space

Performance Notes

Processing Time

Batch processing is sequential (one dataset at a time):

  • Time per dataset: ~30-60 seconds (typical)
  • Total time: num_datasets × time_per_dataset

Example:

10 datasets × 45 seconds = ~7.5 minutes total

Reducing Processing Time

# fast_config.yaml - Optimized for speed
preprocessing:
  method: minimal  # Minimal preprocessing (fastest)

segmentation:
  algorithm: threshold  # Faster than watershed
  min_pixels: 200
  morphology_operations: false  # Skip cleanup
  remove_outliers: false  # Skip outlier detection
hyperseed batch datasets/ --config fast_config.yaml

Memory Usage

  • Per dataset: ~1-2GB RAM
  • Total: Same as single dataset (sequential processing)

Troubleshooting

Issue: No datasets found

Error: No datasets found matching '*'

Solutions:

# Check directory structure
ls -la datasets/

# Verify capture folders exist
find datasets/ -name "capture" -type d

# Check pattern
hyperseed batch datasets/ --pattern "*" -v

Issue: All datasets failing

Possible causes: - Incorrect directory structure - Missing reference files - Corrupted data

Solutions:

# Test single dataset first
hyperseed analyze datasets/sample_001 --output test.csv

# Enable debug mode
hyperseed batch datasets/ --debug

Issue: Some seeds missing

Possible causes: - min_pixels threshold too high - Outlier removal too aggressive

Solutions:

# Lower min_pixels
hyperseed batch datasets/ --min-pixels 100

# Disable outlier removal
hyperseed batch datasets/ --no-outlier-removal

# Use custom config with looser thresholds

Issue: Processing too slow

Solutions:

# Use minimal preprocessing
hyperseed batch datasets/ --config fast_config.yaml

# Reduce plot generation (custom config)

# fast_config.yaml
output:
  include_plots: false  # Skip plots for speed

Comparison with analyze Command

Feature batch analyze
Datasets Multiple Single
Processing Sequential Single run
Output Multiple files Single file set
Error handling Continues on failure Stops on error
Progress Shows count (1/N) Shows progress bar
Interactive No Optional (--export-plots)
Use case Process many datasets Detailed single analysis

When to use batch: - Processing multiple datasets with same settings - Automated workflows - Consistent analysis across samples

When to use analyze: - Single dataset analysis - Testing different parameters - Need detailed progress information

Advanced Usage

Batch with Different Settings Per Type

Process different dataset types with different configs:

# Process SWIR datasets with one config
hyperseed batch datasets/ \
    --pattern "SWIR_*" \
    --config swir_config.yaml \
    --output-dir results_swir/

# Process VIS datasets with another config
hyperseed batch datasets/ \
    --pattern "VIS_*" \
    --config vis_config.yaml \
    --output-dir results_vis/

Progress Monitoring

# Run with verbose output
hyperseed batch datasets/ -v --output-dir results/

# Monitor output directory
watch -n 5 'ls -lh results/*.csv | wc -l'

Resume Failed Processing

# First run - some fail
hyperseed batch datasets/ --output-dir results/

# Find what succeeded
ls results/*.csv

# Process only missing datasets
hyperseed batch datasets/ \
    --pattern "sample_00[6-9]" \
    --output-dir results/

See Also