Newsroom
Tuberculosis (TB), caused by Mycobacterium tuberculosis (MTB), remains one of the world's deadliest infectious diseases. The Bacillus Calmette–Guérin (BCG) vaccine protects young children from severe TB, but provides limited protection against adult pulmonary TB.
Developing new vaccines requires selecting protective antigens from nearly 4,000 MTB proteins. However, immunological evidence is scattered across a vast body of literature. Traditional computational tools mainly rely on sequence or structure features, and large language models (LLMs) alone may lack reliability and traceability.
In a study published in Biosafety and Health, a research team led by Prof. ZHANG Guoqing from the Shanghai Institute of Nutrition and Health of the Chinese Academy of Sciences developed a knowledge-guided system that uses LLMs to help identify promising antigens for tuberculosis vaccines.
Prof. ZHANG's team, together with Prof. WANG Ying’s team from Shanghai Jiao Tong University School of Medicine, built an LLM-assisted knowledge graph system, MTB-ImmunogenKG. They mined more than 77,000 publications indexed in PubMed and extracted 1.48 million sentence-level evidence records. The system covers 3,154 MTB proteins, about 77% of annotated proteins in the genome.
The system integrates automated information extraction with knowledge-enhanced reasoning. It predicts antigen protective efficacy and traces supporting evidence. Importantly, it can detect contradictory findings reported in different studies and explain why experimental outcomes differ. Compared with conventional sequence-based methods, it significantly improves prediction performance, and outperforms a standalone LLM baseline.
Moreover, the system revealed that antigen combinations and adjuvant associations may influence immune protection. By organizing fragmented literature into structured knowledge, the system enables transparent and explainable decision-making in vaccine design.