Research News
A Comprehensive Map of Mobile Element Insertions from 5,675 Genomes Published
Editor: LIU Jia | Mar 01, 2022
Print

Recently, the research groups of Prof. XU Tao and Prof. HE Shunmin from the Institute of Biophysics of the Chinese Academy of Sciences reported the genome resource NyuWa of mobile element insertions (MEIs), in order to promote MEI genetic and medical research in world population. This study was published in Nucleic Acids Research.

In the human genome, Alu, LINE-1 (L1), SINE-VNTR-Alu (SVA), and HERV-K are the families of mobile elements that are generally considered to be still active and capable of forming new insertions in the genome through transposition, known as MEI. Transposition events have the potential to interrupt functional regions of the genome. More than 120 human genetic diseases have been reported to be associated with transposon-mediated insertions. Besides, the intrinsic sequence properties of the transposable elements confer functional effects on the host for some MEIs, making them qualitatively different from other typical structural variants.

There is a paucity of resources for integrating polymorphic transposable elements in the human genome, which is the basis for phenotype-variant association analysis. In 2017, the Thousand Genomes Project conducted a comprehensive analysis of MEIs in 2504 genomes. Watkins et al. extended the findings based on the 1,000 Genomes dataset by analyzing the variant profile of MEIs in a global population using 296 genomic data from the Simons Genome Diversity Project. However, the genetic resources for these MEIs are mainly from European populations.

This study reported a comprehensive map of 36,699 non-reference MEIs constructed from 5,675 genomes, including 2,998 Chinese samples and 2,677 samples from the 1000 Genomes Project. It systematically analyzed the genomic distribution, mutation characteristics, and functional impact of MEIs at the population level, and constructed a comprehensive MEI repository, especially the MEI map for the Chinese population. It is part of the NyuWa Chinese Population Genome Project led by Prof, XU and Prof. HE. The NyuWa Genome Project has published the Chinese population genetic variation atlas and reference panel, as well as the Chinese Population Genome Repository, which lays the foundation for genetic and medical research in the Chinese population.

The researchers first identified MEIs by combining 2998 high-depth whole-genome sequencing data from the Nuwa genome resource and 2677 low-depth whole-genome sequencing data from the Thousand Genomes Project. On average, more than 1,000 MEI variants were detected per individual, the majority of which were insertions of Alu components.

They then analyzed the chromosome distribution of MEIs and found that L1 insertion was significantly enriched in the region near the centromeres. The enrichment of L1 insertion variants in the vicinity of the centromere DNA may be due to the high number of α-satellite sequences in the vicinity of the mitotic DNA, and the relatively low GC content is more favorable for L1 insertion. On the other hand, considering that the active transposons in the neo-transposon region identified in previous studies may contribute to the nascent of the centromere, they suggested that the enrichment of L1 in the centromere region may also be biologically important. This finding needs to be investigated in subsequent studies.

Next, the researchers estimated the mutation rate of MEI in the two data sets, i.e., "Nuwa" and 1,000 Genomes, separately. The results are very similar, with approximately one new MEI event per 16-17 births. By comparing MEI diversity and SNP heterozygosity in different populations, they found a high correlation, with African populations having the highest MEI diversity and SNP heterozygosity.

Theoretically, MEIs in the protein-coding region can cause loss of gene function by interrupting the open reading frame. After functional annotation of MEIs, the researchers found that everyone contains an average of 24 MEIs that truncate proteins, and found that MEIs contribute approximately 9.4% of the truncated protein variants per individual by combining short variants (SNP and InDel) with other structural variants, which demonstrated the importance of including MEIs in routine analysis of genome-wide data.

The insertion of L1 is usually accompanied by 3' transduction. Based on this feature, the researchers analyzed the source-offspring relationship of L1, identified some new source-offspring pairs, found some potentially active L1 loci, and discovered differences in their distribution in different populations.

Finally, to facilitate the search and use by researchers of removable components, they constructed an open database, HMEID, to include the MEIs identified in this study. This database is also part of the "Nuwa" genomic data resource.

Contact

HE Shunmin

Institute of Biophysics

E-mail: