TY - JOUR
T1 - Stici
T2 - Split-Transformer with integrated convolutions for genotype imputation
AU - Mowlaei, Mohammad Erfan
AU - Li, Chong
AU - Jamialahmadi, Oveis
AU - Dias, Raquel
AU - Chen, Junjie
AU - Jamialahmadi, Benyamin
AU - Rebbeck, Timothy Richard
AU - Carnevale, Vincenzo
AU - Kumar, Sudhir
AU - Shi, Xinghua
N1 - Publisher Copyright:
© The Author(s) 2025.
PY - 2025/1/31
Y1 - 2025/1/31
N2 - Despite advances in sequencing technologies, genome-scale datasets often contain missing bases and genomic segments, hindering downstream analyses. Genotype imputation addresses this issue and has been a cornerstone pre-processing step in genetic and genomic studies. Although various methods have been widely adopted for genotype imputation, it remains challenging to impute certain genomic regions and large structural variants. Here, we present a transformer-based framework, named STICI, for accurate genotype imputation. STICI models automatically learn genome-wide patterns of linkage disequilibrium, evidenced by much higher imputation accuracy in regions with highly linked variants. Our imputation results on the human 1000 Genomes Project and non-human genomes show that STICI can achieve high imputation accuracy comparable to the state-of-the-art genotype imputation methods, with the additional capability to impute multi-allelic variants and various types of genetic variants. STICI can be trained for any collection of genomes automatically using self-supervision. Moreover, STICI shows excellent performance without needing any special presuppositions about the underlying patterns in collections of non-human genomes, pointing to adaptability and applications of STICI to impute missing genotypes in any species.
AB - Despite advances in sequencing technologies, genome-scale datasets often contain missing bases and genomic segments, hindering downstream analyses. Genotype imputation addresses this issue and has been a cornerstone pre-processing step in genetic and genomic studies. Although various methods have been widely adopted for genotype imputation, it remains challenging to impute certain genomic regions and large structural variants. Here, we present a transformer-based framework, named STICI, for accurate genotype imputation. STICI models automatically learn genome-wide patterns of linkage disequilibrium, evidenced by much higher imputation accuracy in regions with highly linked variants. Our imputation results on the human 1000 Genomes Project and non-human genomes show that STICI can achieve high imputation accuracy comparable to the state-of-the-art genotype imputation methods, with the additional capability to impute multi-allelic variants and various types of genetic variants. STICI can be trained for any collection of genomes automatically using self-supervision. Moreover, STICI shows excellent performance without needing any special presuppositions about the underlying patterns in collections of non-human genomes, pointing to adaptability and applications of STICI to impute missing genotypes in any species.
KW - Humans
KW - Genotype
KW - Linkage Disequilibrium
KW - Genome, Human/genetics
KW - Polymorphism, Single Nucleotide
KW - Genomics/methods
KW - Genome-Wide Association Study/methods
KW - Algorithms
KW - Software
KW - Alleles
UR - https://www.scopus.com/pages/publications/85217731883
U2 - 10.1038/s41467-025-56273-3
DO - 10.1038/s41467-025-56273-3
M3 - Article
C2 - 39890780
SN - 2041-1723
VL - 16
SP - 1218
JO - Nature Communications
JF - Nature Communications
IS - 1
M1 - 1218
ER -