Newsroom
Despite advances in microbiome research, a large proportion of proteins encoded by the human gut microbiome remain functionally uncharacterized. Sequence-based methods often fail to annotate these evolutionarily divergent proteins. These "functional dark matter" highlights the need for frameworks that can explore protein function beyond sequence signals.
In a study published in Cell Host & Microbe, a team led by Dr. DAI Lei from the Shenzhen Institute of Advanced Technology of the Chinese Academy of Sciences, together with collaborators, developed a structure-based retrieval framework for the human gut microbiome, significantly improving the ability to predict "functional dark matter" such as phage proteins and host-derived bacterial isozymes.
Using structure prediction tools, researchers built the Human Gut Microbial Protein Structure Database which contains about 2.7 million predicted protein structures across 968 gut bacterial species and 1,255 phage genomes. They improved the annotation of phage proteins, of which up to 75% typically lack functional labels via sequence-based methods. Structural analogy more than doubled the annotation rate.
Researchers revealed extensive structural diversification of phage endolysins—antibacterial enzymes with high target-species specificity. They experimentally validated several newly predicted endolysins, and found that they eliminated gut pathobionts, demonstrating how structural proteomics can accelerate discovery of precision antimicrobials.
Host isozymes are too sequence-divergent to detect using conventional tools. Through structural comparisons, researchers identified previously unrecognized bacterial enzymes involved in melatonin biosynthesis. Biochemical assays validated their activities, and animal experiments showed that these microbial enzymes can modulate host melatonin levels and have direct impacts on host physiology.
To address cases where even structural similarity is insufficient, researchers developed an alignment-free method powered by structure-aware language models, Dense Enzyme Retrieval (DEER). DEER enables ultrafast and sensitive detection of remote homologs, achieving state-of-the-art performance and extending functional annotation into regions previously inaccessible to both sequence and structure alignment-based tools.
This study develops a new framework that integrates large-scale structural genomics with artificial intelligence-driven inference to resolve the deep functional architecture of gut microbial communities. It opens up new avenues for therapeutic discovery, precision microbiome engineering, and mechanistic understanding of microbe-host interactions.