User Guide
This guide provides a comprehensive overview of how to use MultiCal for multivariate calibration, variable selection, and inference.
Installation
Ensure you have Python installed (3.8+ recommended).
Clone the repository:
git clone https://github.com/LadabioMPAR/Multical.git cd Multical
Install dependencies:
pip install -r requirements.txt
Data Preparation
MultiCal works with text-based data files. You typically need two types of files for calibration:
Spectra File (X-Block): Contains the absorbance or intensity data.
Concentration File (Y-Block): Contains the reference values for the analytes.
Spectra File Format
File extension: .txt (Tab or space separated).
Header: The first row must define the wavelengths/wavenumbers.
Columns: - Column 0: Sample ID or Time (ignored by the loader). - Columns 1+: Spectral data corresponding to the wavelengths in the header.
Example (data/spectra.txt):
Time 400.0 402.0 404.0 ...
10.0 0.123 0.125 0.128 ...
20.0 0.140 0.142 0.145 ...
Concentration File Format
File extension: .txt.
No Header (usually).
Columns: - Column 0: Sample ID or Time (ignored if more than 1 column exists). - Columns 1+: Concentration values for each analyte.
Example (data/reference.txt):
10.0 1.5 5.2 0.8
20.0 1.6 5.1 0.9
Note
The number of rows (samples) in the Spectra file must match the Concentration file.
Workflow 1: Calibration
The calibration workflow is controlled by run_calibration.py.
Edit Configuration: Open
run_calibration.pyand locate the CONFIGURATION section.DATA_FILES = [ ('data/ref1.txt', 'data/spec1.txt'), ('data/ref2.txt', 'data/spec2.txt'), ] MODEL_TYPE = 1 # 1=PLS, 2=SPA, 3=PCR ANALYTES = ['Glucose', 'Ethanol'] # Match columns in reference file
Configure Pretreatment: define the list of operations to apply to the spectra.
PRETREATMENT = [ ['Cut', 900, 1800, 1], # Keep wavelengths between 900 and 1800 ['SG', 7, 2, 1, 1], # Savitzky-Golay (Window=7, Poly=2, Deriv=1) ['SNV', 1], # Standard Normal Variate ]
Run the script:
python run_calibration.pyOutputs: - Console: CV statistics (RMSECV, R², etc.). - Plots: Saved in
results/folder. - Model:results/model_calibration.pkl.
Workflow 2: Variable Selection
To identify the most important wavelengths, use run_variable_selection.py.
Edit Configuration: - Set SELECTION_METHOD to ‘VIP’, ‘SA’, or ‘PSO’. - Configure the method-specific parameters (e.g., VIP_THRESHOLDS, SA_PARAMS). - Ensure DATA_FILES and PRETREATMENT match your calibration goals.
Run the script:
python run_variable_selection.pyOutputs: - Plots showing selected variables vs. RMSECV. - Best subset of variables. - Saved model with selected variables:
results_var_selection/model_variable_selection.pkl.
Workflow 3: Inference
Use run_inference.py to predict concentrations for new spectral data using a trained model.
Edit Configuration: - Set MODEL_PATH to your trained .pkl file. - Set INFERENCE_FILES. The reference file can be None or a dummy path if you only have spectra and want predictions.
Run the script:
python run_inference.pyOutputs: - Predicted concentrations saved to text files (optional, depends on script logic). - Time-series plots comparing Prediction vs Reference (if available).
Preprocessing Reference
The PRETREATMENT list accepts specific codes for various spectral transformations.
Method |
Syntax |
Description |
|---|---|---|
Cut |
|
Selects wavelength ranges. Can specify multiple ranges. |
SG |
|
Savitzky-Golay smoothing and derivatives. |
SNV |
|
Standard Normal Variate normalization. |
MSC |
|
Multiplicative Scatter Correction. |
EMSC |
|
Extended MSC with polynomial baseline correction. |
Deriv |
|
Simple finite difference derivative (1st or 2nd). |
Loess |
|
Local regression smoothing. |
MeanCenter |
|
Subtracts the column mean (centering). |