Publications

Journal Article
Wakeley J, Sargsyan O. Extensions of the coalescent effective population size. Genetics. 2009;181 (1) :341-345.Abstract
We suggest two extensions of the coalescent effective population size of S
jo
̈ din
et al.
(2005) and make a
third, practical point. First, to bolster its relevance to data and allow comparisons between models, the
coalescent effective size should be recast as a kind of mutation effective size. Second, the requirement that
the coalescent effective population size must depend linearly on the actual population size should be lifted.
Third, even if the coalescent effective population size does not exist in the mathematical sense, it may be
difficult to reject Kingman’s coalescent using genetic data.
(pdf)
Wakeley J. Complex speciation of humans and chimpanzees. Nature. 2008;452 :E4-E5.Abstract

Genetic data from two or more species provide information about the process of speciation. In their analysis of DNA from humans, chimpanzees, gorillas, orangutans and macaques (HCGOM), Patterson et al.1 suggest that the apparently short divergence time between humans and chimpanzees on the X chromosome is explained by a massive interspecific hybridization event in the ancestry of these two species. However, Patterson et al.1 do not statistically test their own null model of simple speciation before concluding that speciation was complex, and—even if the null model could be rejected—they do not consider other explanations of a short divergence time on the X chromosome. These include natural selection on the X chromosome in the common ancestor of humans and chimpanzees, changes in the ratio of male-to-female mutation rates over time, and less extreme versions of divergence with gene flow (see ref. 2, for example). I therefore believe that their claim of hybridization is unwarranted.

(pdf)
Wakeley J. Conditional gene genealogies under strong purifying selection. Mol. Biol. Evol. 2008;25 (12) :2615-2626.Abstract

The ancestral selection graph, conditioned on the allelic types in the sample, is used to obtain a limiting gene genealogical process under strong selection. In an equilibrium, two-allele system with strong selection, neutral gene genealogies are predicted for random samples and for samples containing at most one unfavorable allele. Samples containing more than one unfavorable allele have gene genealogies that differ greatly from neutral predictions. However, they are related to neutral gene genealogies via the well-known Ewens sampling formula. Simulations show rapid convergence to limiting analytical predictions as the strength of selection increases. These results extend the idea of a soft selective sweep to deleterious alleles and have implications for the interpretation of polymorphism among disease- causing alleles in humans. 

(pdf)
Jones D, Wakeley J. The influence of gene conversion on linkage disequilibrium around a selective sweep. Genetics. 2008;180 (2) :1251-1259.Abstract

In a 2007 article, McVean studied the effect of recombination on linkage disequilibrium (LD) between

two neutral loci located near a third locus that has undergone a selective sweep. The results demonstrated that two loci on the same side of a selected locus might show substantial LD, whereas the expected LD for two loci on opposite sides of a selected locus is zero. In this article, we extend McVean’s model to include gene conversion. We show that one of the conclusions is strongly affected by gene conversion: when gene conversion is present, there may be substantial LD between two loci on opposite sides of a selective sweep. 

(pdf)
Eldon B, Wakeley J. Linkage disquilibrium under skewed offspring distribution among individuals in a population. Genetics. 2008;178 (3) :1517-1532.Abstract

Correlations in coalescence times between two loci are derived under selectively neutral population models in which the offspring of an individual can number on the order of the population size. The correlations depend on the rates of recombination and random drift and are shown to be functions of the parameters controlling the size and frequency of these large reproduction events. Since a prediction of linkage disequilibrium can be written in terms of correlations in coalescence times, it follows that the prediction of linkage disequilibrium is a function not only of the rate of recombination but also of the reproduction parameters. Low linkage disequilibrium is predicted if the offspring of a single individual frequently replace almost the entire population. However, high linkage disequilibrium can be predicted if the offspring of a single individual replace an intermediate fraction of the population. In some cases the model reproduces the standard Wright–Fisher predictions. Contrary to common intuition, high linkage disequilibrium can be predicted despite frequent recombination, and low linkage disequilibrium under infrequent recombination. Simulations support the analytical results but show that the variance of linkage disequilibrium is very large.

(pdf)
Ramachandran S, Rosenberg NA, Feldman MW, Wakeley J. Population differentiation and migration: Coalescence times in a two-sex island model for autosomal and X-linked loci. Theoret. Pop. Biol. 2008;74 :291-301.Abstract

Evolutionists have debated whether population-genetic parameters, such as effective population size and migration rate, differ between males and females. In humans, most analyses of this problem have focused on the Y chromosome and the mitochondrial genome, while the X chromosome has largely been omitted from the discussion. Past studies have compared F(ST) values for the Y chromosome and mitochondrion under a model with migration rates that differ between the sexes but with equal male and female population sizes. In this study we investigate rates of coalescence for X-linked and autosomal lineages in an island model with different population sizes and migration rates for males and females, obtaining the mean time to coalescence for pairs of lineages from the same deme and for pairs of lineages from different demes. We apply our results to microsatellite data from the Human Genome Diversity Panel, and we examine the male and female migration rates implied by observed F(ST) values.

(pdf)
Jones D, Wakeley J. Recombination, gene conversion, and identity by descent at three loci. Theoret. Pop. Biol. 2008;73 (2) :264-276.Abstract

We investigate the probabilities of identity-by-descent at three loci in order to find a signature which differentiates between the two types of crossing over events: recombination and gene conversion. We use a Markov chain to model coalescence, recombination, gene conversion and mutation in a sample of size two. Using numerical analysis, we calculate the total probability of identity-by-descent at the three loci, and partition these probabilities based on a partial ordering of coalescent events at the three loci. We use these results to compute the probabilities of four different patterns of conditional identity and non-identity at the three loci under recombination and gene conversion. Although recombination and gene conversion do make different predictions, the differences are not likely to be useful in distinguishing between them using three locus patterns between pairs of DNA sequences. This implies that measures of genetic identity in larger samples will be needed to distinguish between gene conversion and recombination.

(pdf)
Sargsyan O, Wakeley J. A coalescent process with simultaneous multiple mergers for approximating the gene genealogies of many marine organisms. Theoret. Pop. Biol. 2008;74 :104-114.Abstract

We describe a forward-time haploid reproduction model with a constant population size that includes life history characteristics common to many marine organisms. We develop coalescent approximations for sample gene genealogies under this model and use these to predict patterns of genetic variation. Depending on the behavior of the underlying parameters of the model, the approximations are coalescent processes with simultaneous multiple mergers or Kingman's coalescent. Using simulations, we apply our model to data from the Pacific oyster and show that our model predicts the observed data very well. We also show that a fact which holds for Kingman's coalescent and also for general coalescent trees--that the most-frequent allele at a biallelic locus is likely to be the ancestral allele--is not true for our model. Our work suggests that the power to detect a "sweepstakes effect" in a sample of DNA sequences from marine organisms depends on the sample size.

(pdf)
Eldon B, Wakeley J. Coalescent processes when the distribution of offspring number among individuals is highly skewed. Genetics. 2006;172 (1) :701-708.Abstract

We report a complex set of scaling relationships between mutation and reproduction in a simple model of a population. These follow from a consideration of patterns of genetic diversity in a sample of DNA sequences. Five different possible limit processes, each with a different scaled mutation parameter, can be used to describe genetic diversity in a large population. Only one of these corresponds to the usual population genetic model, and the others make drastically different predictions about genetic diversity. The complexity arises because individuals can potentially have very many offspring. To the extent that this occurs in a given species, our results imply that inferences from genetic data made under the usual assumptions are likely to be wrong. Our results also uncover a fundamental difference between pop- ulations in which generations are overlapping and those in which generations are discrete. We choose one of the five limit processes that appears to be appropriate for some marine organisms and use a sample of genetic data from a population of Pacific oysters to infer the parameters of the model. The data suggest the presence of rare reproduction events in which ~8% of the population is replaced by the offspring of a single individual. 

(pdf)
Jesus FF, Wilkins JF, Solferini VN, Wakeley J. Expected coalescence times and segregating sites in a model of glacial cycles. Genetics and Molecular Research. 2006;5 (3) :466-474.Abstract
The climatic fluctuations of the Quaternary have influenced the distribution of numerous plant and animal species. Several species suffer population reduction and fragmentation, becoming restricted to refugia during glacial periods and expanding again during interglacials. The reduction in population size may reduce the effective population size, mean coalescence time and genetic variation, whereas an increased subdivision may have the opposite effect. To investigate these two opposing forces, we proposed a model in which a panmictic and a structured phase alternate, corresponding to interglacial and glacial periods. From this model, we derived an expression for the expected coalesence time and number of segregating sites for a pair of genes. We observed that increasing the number of demes or the duration of the structured phases causes an increase in coalescence time and expected levels of genetic variation. We compared numerical results with the ones expected for a panmictic population of constant size, and showed thathe mean number of segregating sites can be greater in our model even when population size is much smaller in the structured phases. This points to the importance of population structure in the history of species subject to climatic fluctuations, and helps explain the long gene genealogies observed in several organisms.
(pdf)
Matsen FA, Wakeley J. Convergence to the island-model coalescent process in populations with restricted migration. Genetics. 2006;172 (1) :701-708.Abstract

In this article we apply some graph-theoretic results to the study of coalescence in a structured population with migration. The graph is the pattern of migration among subpopulations, or demes, and we use the theory of random walks on graphs to characterize the ease with which ancestral lineages can traverse the habitat in a series of migration events. We identify conditions under which the coalescent process in populations with restricted migration, such that individuals cannot traverse the habitat freely in a single migration event, nonetheless becomes identical to the coalescent process in the island migration model in the limit as the number of demes tends to infinity. Specifically, we first note that a sequence of symmetric graphs with Diaconis-Stroock constant bounded above has an unstructured Kingman-type coalescent in the limit for a sample of size two from two different demes. We then show that circular and toroidal models with long-range but restricted migration have an upper bound on this constant and so have an unstructured-migration coalescent in the limit. We investigate the rate of convergence to this limit using simulations.

(pdf)
Wakeley J, Lessard S. Corridors for migration between large subdivided populations, and the structured coalescent. Theoret. Pop. Biol. 2006;70 (4) :412-420.Abstract

We study the ancestral genetic process for samples from two large, subdivided populations that are connected by migration to, from, and within a small set of subpopulations, or demes. We consider convergence to an ancestral limit process as the numbers of demes in the two large, subdivided populations tend to infinity. We show that the ancestral limit process for a sample includes a recent instantaneous adjustment to the sample size and structure followed by a more ancient process that is identical to the usual structured coalescent, but with different scaled parameters. This justifies the application of a modified structured coalescent to some hierarchically structured populations.

(pdf)
Slade P, Wakeley J. The structured ancestral selection graph and the many-demes limit. Genetics. 2005;169 (2) :1117-1131.Abstract

We show that the unstructured ancestral selection graph applies to part of the history of a sample from population structured by restricted migration among subpopulations, or demes. The result holds in the limit as the number of demes tends to infinity with proportionately weak selection, and we have also made assumptions of island-type migration and that demes are equivalent in size. After an instantaneous sample-size adjustment, this structured ancestral selection graph converges to an unstructured ancestral selection graph with a mutation parameter that depends inversely on the migration rate. In contrast, the selection parameter for the population is independent of the migration rate and is identical to the selection parameter in an unstructured population. We show analytically that estimators of the migration rate, based on pairwise sequence differences, derived under the assumption of neutrality should perform equally well in the presence of weak selection. We also modify an algorithm for simulating genealogies conditional on the frequencies of two selected alleles in a sample. This permits efficient simulation of stronger selection than was previously possible. Using this new algorithm, we simulate gene genealogies under the many-demes ancestral selection graph and identify some situations in which migration has a strong effect on the time to the most recent common ancestor of the sample. We find that a similar effect also increases the sensitivity of the genealogy to selection.

(pdf)
Wakeley J. The limits of theoretical population genetics. Genetics. 2005;169 (1) :1-7. (pdf)
Wakeley J. Recent trends in population genetics: more data! more math! simple models?. J. Hered. 2004;95 (5) :397-405.Abstract

Recent developments in population genetics are reviewed and placed in a historical context. Current and future challenges, both in computational methodology and in analytical theory, are to develop models and techniques to extract the most information possible from multilocus DNA datasets. As an example of the theoretical issues, five limiting forms of the island model of population subdivision with migration are presented in a unified framework. These approximations illustrate the interplay between migration and drift in structuring gene genealogies, and some of them make connections between the fairly complicated island-model genealogical process and the much simpler, unstructured neutral coalescent process which underlies most inferential techniques in population genetics.

(pdf)
Achaz G, Palmer S, Kearny M, et al. A robust measure of HIV-1 population turnover within chronically infected individuals. Mol. Biol. Evol. 2004;21 (10) :1902-1912.Abstract

A simple nonparameteric test for population structure was applied to temporally spaced samples of HIV-1 sequences from the gag-pol region within two chronically infected individuals. The results show that temporal structure can be detected for samples separated by about 22 months or more. The performance of the method, which was originally proposed to detect geographic structure, was tested for temporally spaced samples using neutral coalescent simulations. Simulations showed that the method is robust to variation in samples sizes and mutation rates, to the presence/absence of recombination, and that the power to detect temporal structure is high. By comparing levels of temporal structure in simulations to the levels observed in real data, we estimate the effective intra-individual population size of HIV-1 to be between 103 and 104 viruses, which is in agreement with some previous estimates. Using this estimate and a simple measure of sequence diversity, we estimate an effective neutral mutation rate of about 5 x 10-6 per site per generation in the gag-pol region. The definition and interpretation of estimates of such ‘‘effective’’ population parameters are discussed.

(pdf)
Lessard S, Wakeley J. The two-locus ancestral graph in a subdivided population: convergence as the number of demes grows in the island model. J. Math. Biol. 2004;48 (3) :275-292.Abstract
We study the ancestral recombination graph for a pair of sites in a geographically
structured population. In particular, we consider the limiting behavior of the graph, under
Wright’s island model, as the number of subpopulations, or demes, goes to infinity. After an
instantaneous sample-size adjustment, the graph becomes identical to the two-locus graph
in an unstructured population, but with a time scale that depends on the migration rate and
the deme size. Interestingly, when migration is gametic, this rescaling of time increases the
population mutation rate but does not affect the population recombination rate. We compare
this to the case of a partially-selfing population, in which both mutation and recombination
depend on the selfing rate. Our result for gametic migration holds both for finite-sized demes,
and in the limit as the deme size goes to infinity. However, when migration occurs during the
diploid phase of the life cycle and demes are finite in size, the population recombination rate
does depend on the migration rate, in a way that is reminiscent of partial selfing. Simulations
imply that convergence to a rescaled panmictic ancestral recombination graph occurs for
any number of sites as the number of demes approaches infinity.
(pdf)
Wakeley J, Takahashi T. The many-demes limit for selection and drift in a subdivided population. Theoret. Pop. Biol. 2004;66 (2) :83-91.Abstract

A diffusion approximation is obtained for the frequency of a selected allele in a population comprised of many subpopulations or demes. The form of the diffusion is equivalent to that for an unstructured population, except that it occurs on a longer time scale when migration among demes is restricted. This many-demes diffusion limit relies on the collection of demes always being in statistical equilibrium with respect to migration and drift for a given allele frequency in the total population. Selection is assumed to be weak, in inverse proportion to the number of demes, and the results hold for any deme sizes and migration rates greater than zero. The distribution of allele frequencies among denies is also described. [copyright] 2004 Elsevier Inc. All rights reserved.

(pdf)
Wakeley J. Metapopulation models for historical inference. Mol. Ecol. 2004;13 (4) :865-875.Abstract

The genealogical process for a sample from a metapopulation, in which local populations are connected by migration and can undergo extinction and subsequent recolonization, is shown to have a relatively simple structure in the limit as the number of populations in the metapopulation approaches infinity. The result, which is an approximation to the ancestral behaviour of samples from a metapopulation with a large number of populations, is the same as that previously described for other metapopulation models, namely that the genealogical process is closely related to Kingman's unstructured coalescent. The present work considers a more general class of models that includes two kinds of extinction and recolonization, and the possibility that gamete production precedes extinction. In addition, following other recent work, this result for a metapopulation divided into many populations is shown to hold both for finite population sizes and in the usual diffusion limit, which assumes that population sizes are large. Examples illustrate when the usual diffusion limit is appropriate and when it is not. Some shortcomings and extensions of the model are considered, and the relevance of such models to understanding human history is discussed.

(pdf)
Wakeley J, Takahaski T. Gene genealogies when the sample size exceeds the effective size of the population. Mol. Biol. Evol. 2003;20 (2) :208-213.Abstract
We study the properties of gene genealogies for large samples using a continuous approximation introduced by R. A. Fisher. We show that the major effect of large sample size, relative to the effective size of the population, is to increase the proportion of polymorphisms at which the mutant type is found in a single copy in the sample. We derive analytical expressions for the expected number of these singleton polymorphisms and for the total number of polymorphic, or segregating, sites that are valid even when the sample size is much greater than the effective size of the population. We use simulations to assess the accuracy of these predictions and to investigate other aspects of large-sample genealogies. Lastly, we apply our results to some data from Pacific oysters sampled from British Columbia. This illustrates that, when large samples are available, it is possible to estimate the mutation rate and the effective population size separately, in contrast to the case of small samples in which only the product of the mutation rate and the effective population size can be estimated.
(pdf)

Pages