WELCOME

We are devoted to the study of theoretical population genetics. The goal of population genetics is to identify and understand the forces that produce and maintain genetic variation in natural populations. These forces include mutation (also recombination and gene conversion), natural selection, various kinds of population structure (e.g. subdivision with migration), and the random fluctuations of gene frequencies through time known as genetic drift. We study these forces mathematically, using both analysis and computation. We also develop statistical methods to make inferences about these forces from DNA sequences or other kinds of genetic data. For more information about specific areas of research, follow the leads to lab members.

RECENT PUBLICATIONS

King L, Wakeley J, Carmi S. A non-zero variance of Tajima’s estimator for two sequences even for infinitely many unlinked loci. Theoretical Population Biology. 2018;122 :22-29.Abstract
The population-scaled mutation rate, θ, is informative on the effective population size and is thus widely used in population genetics. We show that for two sequences and n unlinked loci, the variance of Tajima’s estimator (ˆθ), which is the average number of pairwise differences, does not vanish even as n → ∞. The non-zero variance of ˆθ results from a (weak) correlation between coalescence times even at unlinked loci, which, in turn, is due to the underlying fixed pedigree shared by gene genealogies at all loci. We derive the correlation coefficient under a diploid, discrete-time, Wright–Fisher model, and we also derive a simple, closed-form lower bound. We also obtain empirical estimates of the correlation of coalescence times under demographic models inspired by large-scale human genealogies. While the effect we describe is small (Var [ˆθ]/θ2 ≈ O(N−1e)), it is important to recognize this feature of statistical population genetics, which runs counter to commonly held notions about unlinked loci.
McAvoy A, Fraiman N, Hauert C, Wakeley J, Nowak MA. Public goods games in populations with fluctuating size. Theoretical Population Biology. 2018;121 :72-84.Abstract
Many mathematical frameworks of evolutionary game dynamics assume that the total population size is constant and that selection affects only the relative frequency of strategies. Here,we consider evolutionary game dynamics in an extended Wright–Fisher process with variable population size. In such a scenario, it is possible that the entire population becomes extinct. Survival of the population may depend on which strategy prevails in the game dynamics. Studying cooperative dilemmas, it is a natural feature of such a model that cooperators enable survival, while defectors drive extinction. Although defectors are favored for any mixed population, random drift could lead to their elimination and the resulting pure-cooperator population could survive. On the other hand, if the defectors remain, then the population will quickly go extinct because the frequency of cooperators steadily declines and defectors alone cannot survive. In a mutation–selection model, we find that (i) a steady supply of cooperators can enable long-term population survival, provided selection is sufficiently strong, and (ii) selection can increase the abundance of cooperators but reduce their relative frequency. Thus, evolutionary game dynamics in populations with variable size generate a multifaceted notion of what constitutes a trait’s long-term success.
Wilton PR, Baduel P, Landon MM, Wakeley J. Population structure and coalescence in pedigrees: Comparisons to the structured coalescent and a framework for inference. Theoretical Population Biology. 2017;115 :1-12.Abstract

Contrary to what is often assumed in population genetics, independently segregating loci do not have completely independent ancestries, since all loci are inherited through a single, shared population pedigree. Previous work has shown that the non-independence between gene genealogies of independently segregating loci created by the population pedigree is weak in panmictic populations, and predictions made from standard coalescent theory are accurate for populations that are at least moderately sized. Here, we investigate patterns of coalescence in pedigrees of structured populations. We find that the pedigree creates deviations away from the predictions of the structured coalescent that persist on a longer timescale than in the case of panmictic populations. Nevertheless, we find that the structured coalescent provides a reasonable approximation for the coalescent process in structured population pedigrees so long as migration events are moderately frequent and there are no migration events in the recent pedigree of the sample. When there are migration events in the recent sample pedigree, we find that distributions of coalescence in the sample can be modeled as a mixture of distributions from different initial sample configurations. We use this observation to motivate a maximum-likelihood approach for inferring migration rates and mutation rates jointly with features of the pedigree such as recent migrant ancestry and recent relatedness. Using simulation, we show that our inference framework accurately recovers long-term migration rates in the presence of recent migration events in the sample pedigree.

King L, Wakeley J. Empirical Bayes estimation of coalescence times from nucleotide sequence data. Genetics. 2016;204 :249-257.Abstract

We demonstrate the advantages of using information at many unlinked loci to better calibrate estimates of the time to the most recent common ancestor (TMRCA) at a given locus. To this end, we apply a simple empirical Bayes method to estimate the TMRCA. This method is both asymptotically optimal, in the sense that the estimator converges to the true value when the number of unlinked loci for which we have information is large, and has the advantage of not making any assumptions about demographic history. The algorithm works as follows: we first split the sample at each locus into inferred left and right clades to obtain many estimates of the TMRCA, which we can average to obtain an initial estimate of the TMRCA. We then use nucleotide sequence data from other unlinked loci to form an empirical distribution that we can use to improve this initial estimate.