
Open clusters are tracers of the structure and evolution of the Milky Way and ideal laboratories for studying star formation and stellar evolution. The massive data release from the Gaia mission has dramatically expanded the number of known open clusters; however, this has also introduced a tricky challenge: the thousands of cluster candidates identified by clustering algorithms are significantly contaminated by "false positive" signals caused by random fluctuations of field stars.
To address this problem, a research team led by Dr. LI Lu from the Shanghai Astronomical Observatory (SHAO) of the Chinese Academy of Sciences proposed a new framework for physically validating open clusters based on photometric Bayesian evidence.
The study, published in The Astrophysical Journal, introduces a powerful quantitative tool for "separating the wheat from the chaff" among the massive number of candidates in the big data era.
Traditionally, validating a cluster candidate has relied on astronomers visually inspecting its color-magnitude diagram (CMD) to determine if a clear "isochrone" feature is present. However, such experience-based judgment is prone to subjective human bias and lacks quantitative evaluation standards. Ambiguous features make it difficult to determine with the naked eye whether a dispersed distribution on the CMD originates from a real cluster broadened by observational errors and binaries, or simply from a random combination of field stars along the line of sight.
Therefore, to ensure sample purity and the reliability of scientific conclusions, there is an urgent need for a statistically rigorous, quantitative validation method that replaces subjective intuition with objective calculations and can mathematically distinguish between real physical systems and random statistical fluctuations.
To meet this need, the researchers developed a self-built Mixture Model for Open Clusters (MiMO) within a Bayesian framework, transforming cluster validation into a rigorous statistical model comparison problem. They examined whether the observational data could better support a "Single Stellar Population (SSP, i.e., cluster) + Field Star" mixture model or a "Pure Field Star" model. The ratio of the evidence for these two models, known as the Bayes factor (BF), directly quantifies the strength of statistical support for the cluster's existence.
The researchers conducted extensive tests using 600 random field star samples and 1,232 confirmed open clusters. The results show that the Bayes factor separates real clusters from false signals extremely well. They found that log10(BF) > 2 (i.e., a BF greater than 100) is as a robust physical criterion, implying that the "Cluster + Field" model is at least 100 times more probable than the "Pure Field" model. This threshold effectively eliminates the vast majority of statistical fluctuations from random field stars while preserving real cluster signals.
Unlike traditional signal-to-noise ratios or goodness-of-fit metrics, the Bayes factor remains sensitive enough to capture hidden cluster signals even under extremely high field contamination (contamination rates >70%), demonstrating strong robustness.
This new method is applicable not only to cleaning and purifying open clusters, but also to validating other resolved stellar systems, such as stellar streams, moving groups, and satellite galaxies of the Milky Way, thanks to its general framework based on mixture model comparison.

Open Cluster NGC2632 (Copyright: Stuart Heggie)

Comparison of CMDs for different targets. Left: Random field stars. Middle: Ambiguous candidates. Right: Confirmed open clusters (Orange points represent sample stars; grey background shows the field model). (Image by SHAO)

The distribution of Bayes factors effectively separates real signals from noise. The blue histogram represents confirmed Open Clusters, while the orange represents mock random field samples. The vertical dashed line marks the threshold at log_{10}(BF) = 2. This clear separation demonstrates that the metric robustly distinguishes genuine physical systems from random background fluctuation. (Image by SHAO)
86-10-68597521 (day)
86-10-68597289 (night)
52 Sanlihe Rd., Xicheng District,
Beijing, China (100864)