Substitution-rate variation among sites and the estimation of transition bias.

(pdf)1.06 MB

Abstract:

Substitution-rate variation among sites and differences in the probabilities of change among the four nucleotides are conflated in DNA sequence comparisons. When variation in rate exists among sites but is ignored, biases in the rates of change among nucleotides are underestimated. This paper provides a quantification of this effect when the observed proportions of transitions, P, and transversions, Q, between two sequences are used to estimate transition bias. The utility of P/Q as an estimator is examined both with and without rate variation among sites. A gamma-distributed-rates model is used to illustrate the effect that variation among sites has on estimates of transition bias, but it is argued that the basic results should hold for any pattern of rate variation. Naive estimates of the extent of transition bias, those that ignore rate variation when it is present, can seriously underestimate its true value. The extent of this underestimation increases with the amount of rate variation among sites. An example using human mitochondrial DNA shows that a simple comparison of the proportions of transitions and transversions in recently diverged sequences underestimates the level of transition bias by approximately 15%. This does not depend on the use of P/Q to estimate transition bias; maximum-likelihood methods give similar results.

Last updated on 07/19/2016