Preprocessing API¶

Programmatic interface for spectral preprocessing.

PreprocessingPipeline¶

The PreprocessingPipeline class provides a complete preprocessing pipeline with configurable methods.

Basic Usage¶

from hyperseed.core.preprocessing.pipeline import PreprocessingPipeline
from hyperseed.config.settings import Settings

# Create pipeline with default settings
settings = Settings()
pipeline = PreprocessingPipeline(settings.preprocessing)

# Apply preprocessing
preprocessed_data = pipeline.fit_transform(calibrated_data)

With Custom Configuration¶

from hyperseed.config.settings import PreprocessingConfig

# Create custom configuration
config = PreprocessingConfig(
    method="custom",
    snv=True,
    smoothing=True,
    smoothing_window=15,
    smoothing_polyorder=3,
    baseline_correction=True,
    baseline_order=2,
    derivative=1,
    msc=False,
    detrend=False
)

# Create pipeline
pipeline = PreprocessingPipeline(config)

# Fit and transform
preprocessed = pipeline.fit_transform(data)

Methods¶

`init(config: PreprocessingConfig)`¶

Initialize the preprocessing pipeline with configuration.

Parameters: - config: PreprocessingConfig object with preprocessing settings

Example:

from hyperseed.config.settings import PreprocessingConfig

config = PreprocessingConfig(method="standard")
pipeline = PreprocessingPipeline(config)

`fit(data: np.ndarray) -> PreprocessingPipeline`¶

Fit the preprocessing pipeline on data (e.g., compute MSC reference spectrum).

Parameters: - data: Hyperspectral data (Y, X, Bands) or (Samples, Bands)

Returns: - Self (for method chaining)

Example:

pipeline.fit(training_data)

`transform(data: np.ndarray) -> np.ndarray`¶

Apply preprocessing transformations to data.

Parameters: - data: Hyperspectral data to transform

Returns: - Preprocessed data with same shape as input

Example:

preprocessed = pipeline.transform(test_data)

Note: Must call fit() before transform() if using MSC.

`fit_transform(data: np.ndarray) -> np.ndarray`¶

Fit the pipeline and transform data in one step.

Parameters: - data: Hyperspectral data to fit and transform

Returns: - Preprocessed data

Example:

preprocessed = pipeline.fit_transform(data)

This is equivalent to:

pipeline.fit(data)
preprocessed = pipeline.transform(data)

`get_step_names() -> list[str]`¶

Get list of enabled preprocessing steps.

Returns: - List of step names that will be applied

Example:

steps = pipeline.get_step_names()
print(f"Preprocessing steps: {', '.join(steps)}")
# Output: Preprocessing steps: SNV, Smoothing, Baseline Correction

`describe() -> str`¶

Get human-readable description of the pipeline.

Returns: - String describing the preprocessing configuration

Example:

print(pipeline.describe())
# Output: Preprocessing Pipeline:
#   - SNV: enabled
#   - Smoothing: Savitzky-Golay (window=11, polyorder=3)
#   - Baseline Correction: Polynomial (order=2)
#   ...

Individual Preprocessing Functions¶

For more control, use individual preprocessing functions from hyperseed.core.preprocessing.methods.

apply_snv¶

Standard Normal Variate transformation.

from hyperseed.core.preprocessing.methods import apply_snv

# Apply SNV to spectra
snv_data = apply_snv(data, axis=-1)

Parameters: - data: Input array - axis: Axis along which to apply SNV (default: -1)

Returns: - SNV-transformed data

apply_smoothing¶

Smooth spectra using various methods.

from hyperseed.core.preprocessing.methods import apply_smoothing

# Savitzky-Golay smoothing (default)
smoothed = apply_smoothing(
    data,
    window_length=11,
    polyorder=3,
    method="savgol"
)

# Moving average
smoothed = apply_smoothing(
    data,
    window_length=11,
    method="moving_average"
)

# Gaussian filter
smoothed = apply_smoothing(
    data,
    window_length=11,
    method="gaussian"
)

Parameters: - data: Input array - window_length: Window size (must be odd) - polyorder: Polynomial order for Savitzky-Golay (default: 3) - method: "savgol", "moving_average", or "gaussian" (default: "savgol") - axis: Axis along which to smooth (default: -1)

Returns: - Smoothed data

apply_derivative¶

Compute spectral derivatives.

from hyperseed.core.preprocessing.methods import apply_derivative

# First derivative
first_deriv = apply_derivative(
    data,
    order=1,
    window_length=11,
    polyorder=3
)

# Second derivative
second_deriv = apply_derivative(
    data,
    order=2,
    window_length=11,
    polyorder=3
)

Parameters: - data: Input array - order: Derivative order (1 or 2) - window_length: Window size for Savitzky-Golay (default: 11) - polyorder: Polynomial order (default: 3) - axis: Axis along which to compute (default: -1)

Returns: - Derivative spectra

apply_baseline_correction¶

Remove baseline from spectra.

from hyperseed.core.preprocessing.methods import apply_baseline_correction

# Polynomial baseline
corrected = apply_baseline_correction(
    data,
    order=2,
    method="polynomial"
)

# Rubberband baseline
corrected = apply_baseline_correction(
    data,
    method="rubberband"
)

# ASLS baseline
corrected = apply_baseline_correction(
    data,
    method="asls"
)

Parameters: - data: Input array - order: Polynomial order (for polynomial method, default: 2) - method: "polynomial", "rubberband", or "asls" (default: "polynomial") - axis: Axis along which to correct (default: -1)

Returns: - Baseline-corrected data

Methods: - polynomial: Fits polynomial and subtracts - rubberband: Convex hull method - asls: Asymmetric Least Squares (lam=1e6, p=0.01, niter=10)

apply_msc¶

Multiplicative Scatter Correction.

from hyperseed.core.preprocessing.methods import apply_msc

# Use mean spectrum as reference
corrected = apply_msc(data)

# Use custom reference
reference = data[0, :]  # Use first spectrum
corrected = apply_msc(data, reference=reference)

Parameters: - data: Input array - reference: Reference spectrum (if None, uses mean) - axis: Axis along which to apply (default: -1)

Returns: - MSC-corrected data

apply_detrend¶

Remove linear trends.

from hyperseed.core.preprocessing.methods import apply_detrend

# Linear detrending
detrended = apply_detrend(data, type="linear")

# Constant detrending (remove mean)
detrended = apply_detrend(data, type="constant")

Parameters: - data: Input array - type: "linear" or "constant" (default: "linear") - axis: Axis along which to detrend (default: -1)

Returns: - Detrended data

apply_normalization¶

Normalize spectra (not exposed in config, but available via API).

from hyperseed.core.preprocessing.methods import apply_normalization

# Min-max normalization [0, 1]
normalized = apply_normalization(data, method="minmax")

# Max normalization
normalized = apply_normalization(data, method="max")

# Area normalization
normalized = apply_normalization(data, method="area")

# Vector (L2) normalization
normalized = apply_normalization(data, method="vector")

Parameters: - data: Input array - method: "minmax", "max", "area", or "vector" (default: "minmax") - axis: Axis along which to normalize (default: -1)

Returns: - Normalized data

Complete Example¶

import numpy as np
from hyperseed.core.io.envi_reader import ENVIReader
from hyperseed.core.calibration.reflectance import ReflectanceCalibrator
from hyperseed.core.preprocessing.pipeline import PreprocessingPipeline
from hyperseed.config.settings import PreprocessingConfig

# Load data
reader = ENVIReader("path/to/data.hdr")
data = reader.read_data()
wavelengths = reader.get_wavelengths()

# Calibrate
calibrator = ReflectanceCalibrator(clip_negative=True, clip_max=1.0)
calibrated, _ = calibrator.calibrate_from_directory("path/to/dataset")

# Configure preprocessing
config = PreprocessingConfig(
    method="custom",
    snv=True,
    smoothing=True,
    smoothing_window=15,
    baseline_correction=True,
    derivative=1
)

# Preprocess
pipeline = PreprocessingPipeline(config)
preprocessed = pipeline.fit_transform(calibrated)

print(f"Original shape: {calibrated.shape}")
print(f"Preprocessed shape: {preprocessed.shape}")
print(f"Preprocessing steps: {', '.join(pipeline.get_step_names())}")

# Use preprocessed data for segmentation or analysis

Using Individual Functions¶

from hyperseed.core.preprocessing.methods import (
    apply_snv,
    apply_smoothing,
    apply_baseline_correction,
    apply_derivative
)

# Manual preprocessing pipeline
data_snv = apply_snv(calibrated, axis=-1)
data_smooth = apply_smoothing(data_snv, window_length=15, polyorder=3)
data_baseline = apply_baseline_correction(data_smooth, order=2)
data_deriv = apply_derivative(data_baseline, order=1)

# Now use data_deriv for analysis

Notes¶

All preprocessing functions preserve data shape
Operations are applied along the last axis by default (spectral axis)
For hyperspectral cubes (Y, X, Bands), reshaping may be needed
The PreprocessingPipeline handles reshaping automatically
MSC requires fitting before transformation (computes reference spectrum)
Normalization is available via API but not exposed in configuration

Preprocessing API¶

PreprocessingPipeline¶

Basic Usage¶

With Custom Configuration¶

Methods¶

__init__(config: PreprocessingConfig)¶

fit(data: np.ndarray) -> PreprocessingPipeline¶

transform(data: np.ndarray) -> np.ndarray¶

fit_transform(data: np.ndarray) -> np.ndarray¶

get_step_names() -> list[str]¶

describe() -> str¶

Individual Preprocessing Functions¶

apply_snv¶

apply_smoothing¶

apply_derivative¶

apply_baseline_correction¶

apply_msc¶

apply_detrend¶

apply_normalization¶

Complete Example¶

Using Individual Functions¶

Notes¶

See Also¶

`init(config: PreprocessingConfig)`¶

`fit(data: np.ndarray) -> PreprocessingPipeline`¶

`transform(data: np.ndarray) -> np.ndarray`¶

`fit_transform(data: np.ndarray) -> np.ndarray`¶

`get_step_names() -> list[str]`¶

`describe() -> str`¶