Research News
Machine Learning Framework Improves High-precision Serum Tumor Biomarker Detection
Editor: LIU Jia | Jun 08, 2026
Print

Accurate detection of serum tumor biomarkers is important for early cancer screening. However, serum is a highly complex biological system where signals from different molecules often overlap and interfere with each other, making accurate quantification difficult. Many existing machine learning methods for biomarker detection function as "black boxes," making their prediction process difficult to understand in clinical applications.

In a study published in Analytical Chemistry, a team from Hefei Institutes of Physical Science of the Chinese Academy of Sciences, along with collaborators from Hefei Cancer Hospital, developed an interpretable stacked ensemble learning framework for detecting serum tumor biomarkers. By combining the framework with surface-enhanced Raman spectroscopy (SERS), the researchers achieved high-precision quantitative analysis of up to 12 biomarkers in serum.

The framework for SERS data analysis predicts biomarker concentrations and helps explain which spectral features contribute to the results. It integrates three machine learning models: support vector regression, extreme gradient boosting and partial least squares regression, and it improve prediction stability and accuracy through an elastic net-based meta-model. A LASSO-based feature selection method is also introduced, which reduces data dimensionality and improves computational efficiency.

This framework showed strong performance in quantifying 12 tumor biomarkers, including AFP, CEA, CA19-9, and CA125. All these biomarkers achieved R2 values above 0.9, with ferritin and SCCA reaching 0.981 and 0.988, respectively.

To improve the explainability, the researchers applied Shapley additive explanations (SHAP) to connect key Raman spectral peaks with molecular vibration features, which helped reveal how factors such as glycosylation, matrix interference, and spectral overlap affect prediction accuracy.

This study provides a general framework for high-precision multi-biomarker detection in complex biological samples, with potential applications in early cancer screening and precision medicine.