A non-zero variance of Tajima’s estimator for two sequences even for infinitely many unlinked loci

(pdf)1.34 MB

Abstract:

The population-scaled mutation rate, θ, is informative on the effective population size and is thus widely used in population genetics. We show that for two sequences and n unlinked loci, the variance of Tajima’s estimator (ˆθ), which is the average number of pairwise differences, does not vanish even as n → ∞. The non-zero variance of ˆθ results from a (weak) correlation between coalescence times even at unlinked loci, which, in turn, is due to the underlying fixed pedigree shared by gene genealogies at all loci. We derive the correlation coefficient under a diploid, discrete-time, Wright–Fisher model, and we also derive a simple, closed-form lower bound. We also obtain empirical estimates of the correlation of coalescence times under demographic models inspired by large-scale human genealogies. While the effect we describe is small (Var [ˆθ]/θ2 ≈ O(N−1e)), it is important to recognize this feature of statistical population genetics, which runs counter to commonly held notions about unlinked loci.

Last updated on 07/18/2018