TY - JOUR
T1 - On estimating evolutionary probabilities of population variants
AU - Patel, Ravi
AU - Kumar, Sudhir
N1 - Publisher Copyright:
© 2019 The Author(s).
PY - 2019/6/25
Y1 - 2019/6/25
N2 - Background: The evolutionary probability (EP) of an allele in a DNA or protein sequence predicts evolutionarily permissible (ePerm; EP ≥ 0.05) and forbidden (eForb; EP < 0.05) variants. EP of an allele represents an independent evolutionary expectation of observing an allele in a population based solely on the long-term substitution patterns captured in a multiple sequence alignment. In the neutral theory, EP and population frequencies can be compared to identify neutral and non-neutral alleles. This approach has been used to discover candidate adaptive polymorphisms in humans, which are eForbs segregating with high frequencies. The original method to compute EP requires the evolutionary relationships and divergence times of species in the sequence alignment (a timetree), which are not known with certainty for most datasets. This requirement impedes a general use of the original EP formulation. Here, we present an approach in which the phylogeny and times are inferred from the sequence alignment itself prior to the EP calculation. We evaluate if the modified EP approach produces results that are similar to those from the original method. Results: We compared EP estimates from the original and the modified approaches by using more than 18,000 protein sequence alignments containing orthologous sequences from 46 vertebrate species. For the original EP calculations, we used species relationships from UCSC and divergence times from TimeTree web resource, and the resulting EP estimates were considered to be the ground truth. We found that the modified approaches produced reasonable EP estimates for HGMD disease missense variant and 1000 Genomes Project missense variant datasets. Our results showed that reliable estimates of EP can be obtained without a priori knowledge of the sequence phylogeny and divergence times. We also found that, in order to obtain robust EP estimates, it is important to assemble a dataset with many sequences, sampling from a diversity of species groups. Conclusion: We conclude that the modified EP approach will be generally applicable for alignments and enable the detection of potentially neutral, deleterious, and adaptive alleles in populations.
AB - Background: The evolutionary probability (EP) of an allele in a DNA or protein sequence predicts evolutionarily permissible (ePerm; EP ≥ 0.05) and forbidden (eForb; EP < 0.05) variants. EP of an allele represents an independent evolutionary expectation of observing an allele in a population based solely on the long-term substitution patterns captured in a multiple sequence alignment. In the neutral theory, EP and population frequencies can be compared to identify neutral and non-neutral alleles. This approach has been used to discover candidate adaptive polymorphisms in humans, which are eForbs segregating with high frequencies. The original method to compute EP requires the evolutionary relationships and divergence times of species in the sequence alignment (a timetree), which are not known with certainty for most datasets. This requirement impedes a general use of the original EP formulation. Here, we present an approach in which the phylogeny and times are inferred from the sequence alignment itself prior to the EP calculation. We evaluate if the modified EP approach produces results that are similar to those from the original method. Results: We compared EP estimates from the original and the modified approaches by using more than 18,000 protein sequence alignments containing orthologous sequences from 46 vertebrate species. For the original EP calculations, we used species relationships from UCSC and divergence times from TimeTree web resource, and the resulting EP estimates were considered to be the ground truth. We found that the modified approaches produced reasonable EP estimates for HGMD disease missense variant and 1000 Genomes Project missense variant datasets. Our results showed that reliable estimates of EP can be obtained without a priori knowledge of the sequence phylogeny and divergence times. We also found that, in order to obtain robust EP estimates, it is important to assemble a dataset with many sequences, sampling from a diversity of species groups. Conclusion: We conclude that the modified EP approach will be generally applicable for alignments and enable the detection of potentially neutral, deleterious, and adaptive alleles in populations.
KW - Evolutionary probability
KW - Forbidden alleles
KW - Generalized method
KW - Potential adaptation
UR - http://www.scopus.com/inward/record.url?scp=85068116799&partnerID=8YFLogxK
U2 - 10.1186/s12862-019-1455-7
DO - 10.1186/s12862-019-1455-7
M3 - Article
C2 - 31238981
AN - SCOPUS:85068116799
SN - 1471-2148
VL - 19
JO - BMC Evolutionary Biology
JF - BMC Evolutionary Biology
IS - 1
M1 - 133
ER -