Journal Article
Maruvka YE, Shnerb NM, Bar-Yam Y, Wakeley J. Recovering population parameters from a single gene genealogy: an unbiased estimator of the growth rate. Mol. Biol. Evol. 2011;28 (5) :1617-1631.Abstract
We show that the number of lineagesancestral to a sample, as a function of time back into the past, which we call the number
of lineages as a function of time (NLFT), is a nearly deterministic property of large-sample gene genealogies. We obtain analytic
expressionsfor the NLFT for both constant-sizedand exponentiallygrowing populations.The low level of stochastic variation
associated with the NLFT of a large sample suggests using the NLFT to make estimates of population parameters. Based on
this, we develop a new computational method of inferring the size and growth rate of a population from a large sample of
DNA sequences at a single locus. We apply our method first to a sample of 1,212 mitochondrial DNA (mtDNA) sequences
from China, confirming a pattern of recent population growth previously identified using other techniques, but with much
smaller confidence intervals for past population sizes due to
the low variation of the NLFT. We further analyze a set of 63
mtDNA sequences from blue whales (BWs), concluding that the population grew in the past. This calls for reevaluation of
previous studies that were based on the assumption that the BW population was fixed.
Watson RA, Weinreich D, Wakeley J. Genome structure and the benefit of sex. Evolution. 2010;65-28 :523-536.Abstract
We examine the behavior of sexual and asexual populations in modular multipeaked fitness landscapes and show that sexuals
can systematically reach different, higher fitness adaptive peaks than asexuals. Whereas asexuals must move against selection
to escape local optima, sexuals reach higher fitness peaks reliably because they create specific genetic variants that “skip over”
fitness valleys, moving from peak to peak in the fitness landscape. This occurs because recombination can supply combinations
of mutations in functional composites or “modules,” that may include individually deleterious mutations. Thus when a beneficial
module is substituted for another less-fit module by sexual recombination it provides a genetic variant that would require either
several specific simultaneous mutations in an asexual population or a sequence of individual mutations some of which would be
selected against. This effect requires modular genomes, such that subsets of strongly epistatic mutations are tightly physically
linked. We argue that such a structure is provided simply by virtue of the fact that genomes contain many genes each containing
many strongly epistatic nucleotides. We briefly discuss the connections with “building blocks” in the evolutionary computation
literature. We conclude that there are conditions in which sexuals can systematically evolve high-fitness genotypes that are
essentially unevolvable for asexuals.
RoyChoudhury A, Wakeley J. Sufficiency of the number of segregating sites in the limit under finite-sites mutation. Theoret. Pop. Biol. 2010;78 (2) :118-122.Abstract
We show that the number of segregating sites is a sufficient statistic for the scaled mutation parameter image in the limit as the number of sites tends to infinity and there is free recombination between sites. We assume that the mutation parameter at each site tends to zero such than the total mutation parameter image is constant in the limit. Our results show that Watterson’s estimator is the maximum likelihood estimator in this case, but that it estimates a composite parameter which is different for different mutation models. Some of our results hold when recombination is limited, because Watterson’s estimator is an unbiased, method-of-moments estimator regardless of the recombination rate. The quantity it estimates depends on the details of how mutations occur at each site.
Garrigan D, Lewontin R, Wakeley J. Measuring the sensitivity of single-locus "neutrality tests" using a direct perturbation approach. Molecular Biology and Evolution. 2010;27 (1) :73-89.Abstract

A large number of statistical tests have been proposed to detect natural selection based on a sample of variation at a single genetic locus. These tests measure the deviation of the allelic frequency distribution observed within populations from the distribution expected under a set of assumptions that includes both neutral evolution and equilibrium population demography. The present study considers a new way to assess the statistical properties of these tests of selection, by their behavior in response to direct perturbations of the steady-state allelic frequency distribution, unconstrained by any particular nonequilibrium demographic scenario. Results from Monte Carlo computer simulations indicate that most tests of selection are more sensitive to perturbations of the allele frequency distribution that increase the variance in allele frequencies than to perturbations that decrease the variance. Simulations also demonstrate that it requires, on average, 4N generations (N is the diploid effective population size) for tests of selection to relax to their theoretical, steady-state distributions following different perturbations of the allele frequency distribution to its extremes. This relatively long relaxation time highlights the fact that these tests are not robust to violations of the other assumptions of the null model besides neutrality. Lastly, genetic variation arising under an example of a regularly cycling demographic scenario is simulated. Tests of selection performed on this last set of simulated data confirm the confounding nature of these tests for the inference of natural selection, under a demographic scenario that likely holds for many species. The utility of using empirical, genomic distributions of test statistics, instead of the theoretical steady-state distribution, is discussed as an alternative for improving the statistical inference of natural selection.

Cenik C, Wakeley J. Pacific salmon and the coalescent effective population size. PLoS ONE. 2010;5 (9) :e13019, 1-10.Abstract

Pacific salmon include several species that are both commercially important and endangered. Understanding the causes of loss in genetic variation is essential for designing better conservation strategies. Here we use a coalescent approach to analyze a model of the complex life history of salmon, and derive the coalescent effective population (CES). With the aid of Kronecker products and a convergence theorem for Markov chains with two time scales, we derive a simple formula for the CES and thereby establish its existence. Our results may be used to address important questions regarding salmon biology, in particular about the loss of genetic variation. To illustrate the utility of our approach, we consider the effects of fluctuations in population size over time. Our analysis enables the application of several tools of coalescent theory to the case of salmon.

Shpak M, Wakeley J, Garrigan D, Lewontin RC. A structured coalescent process for seasonally fluctuating populations. Evolution. 2010;64 (5) :1395-1409.Abstract

Many short-lived organisms pass through several generations during favorable growing seasons, separated by inhospitable periods during which only small hibernating or estivating refugia remain. This induces pronounced seasonal fluctuations in population size and metapopulation structure. The first generations in the growing season will be characterized by small, relatively isolated demes whereas the later generations will experience larger deme sizes with more extensive gene flow. Fluctuations of this sort can induce changes in the amount of genetic variation in early season samples compared to late season samples, a classical example being the observations of seasonal variation in allelism in New England Drosophila populations by PT. Ives. In this article, we study the properties of a structured coalescent process under seasonal fluctuations using numerical analysis of exact state equations, analytical approximations that rely on a separation of timescales between intrademic versus interdemic processes, and individual-based simulations. We show that although an increase in genetic variation during each favorable growing season is observed, it is not as pronounced as in the empirical observations This suggests that some of the temporal patterns of variation seen by Ives may be due to selection against deleterious lethals rather than neutral processes.

Muirhead C, Wakeley J. Modeling multiallelic selection using a Moran model. Genetics. 2009;182 (4) :1141-1157.Abstract

We present a Moran-model approach to modeling general multiallelic selection in a finite population

and show how it may be used to develop theoretical models of biological systems of balancing selection such as plant gametophytic self-incompatibility loci. We propose new expressions for the stationary distribution of allele frequencies under selection and use them to show that the continuous-time Markov chain describing allele frequency change with exchangeable selection and Moran-model reproduction is reversible. We then use the reversibility property to derive the expected allele frequency spectrum in a finite population for several general models of multiallelic selection. Using simulations, we show that our approach is valid over a broader range of parameters than previous analyses of balancing selection based on diffusion approximations to the Wright–Fisher model of reproduction. Our results can be applied to any model of multiallelic selection in which fitness is solely a function of allele frequency. 

Eldon B, Wakeley J. Coalescence times and FST under a skewed offspring distribution among individuals in a population. Genetics. 2009;181 (2) :615-629.Abstract
Estimates of gene flow between subpopulations based on
) are shown to be confounded by
the reproduction parameters of a model of skewed offspring distribution. Genetic evidence of population
subdivision can be observed even when gene flow is very high, if the offspring distribution is skewed. A
skewed offspring distribution arises when individuals can have very many offspring with some probability.
This leads to high probability of identity by descent within subpopulations and results in genetic
heterogeneity between subpopulations even when
is very large. Thus, we consider a limiting model in
which the rates of coalescence and migration can be much higher than for a Wright–Fisher population.
We derive the densities of pairwise coalescence times and expressions for
and other statistics under
both the finite island model and a many-demes limit model. The results can explain the observed genetic

heterogeneity among subpopulations of certain marine organisms despite substantial gene flow

Wakeley J, Sargsyan O. The conditional ancestral selection graph with strong balancing selection. Theoret. Pop. Biol. 2009;75 (4) :355-364.Abstract

Using a heuristic separation-of-time-scales argument, we describe the behavior of the conditional ancestral selection graph with very strong balancing selection between a pair of alleles. In the limit as the strength of selection tends to infinity, we find that the ancestral process converges to a neutral structured coalescent, with two subpopulations representing the two alleles and mutation playing the role of migration. This agrees with a previous result of Kaplan et al., obtained using a different approach. We present the results of computer simulations to support our heuristic mathematical results. We also present a more rigorous demonstration that the neutral conditional ancestral process converges to the Kingman coalescent in the limit as the mutation rate tends to infinity.

Antal T, Ohtsuki H, Wakeley J, Taylor PD, Nowak M. Evolution of cooperation by phenotypic similarity. Proc. Natl. Acad. Sci., USA. 2009;106 :8597-8600.Abstract
The emergence of cooperation in populations of selfish individu-
als is a fascinating topic that has inspired much work in theoretical
biology. Here, we study the evolution of cooperation in a model
where individuals are characterized by phenotypic properties that
are visible to others. The population is well mixed in the sense
that everyone is equally likely to interact with everyone else, but
the behavioral strategies can depend on distance in phenotype
space. We study the interaction of cooperators and defectors. In
our model, cooperators cooperate with those who are similar and
defect otherwise. Defectors always defect. Individuals mutate to
nearby phenotypes, which generates a random walk of the popu-
lation in phenotype space. Our analysis brings together ideas from
coalescence theory and evolutionary game dynamics. We obtain
a precise condition for natural selection to favor cooperators over
defectors. Cooperation is favored when the phenotypic mutation
rate is large and the strategy mutation rate is small. In the optimal
case for cooperators, in a one-dimensional phenotype space and
for large population size, the critical benefit-to-cost ratio is given
3. We also derive the fundamental condition for
any two-strategy symmetric game and consider high-dimensional
phenotype spaces.
Wakeley J, Sargsyan O. Extensions of the coalescent effective population size. Genetics. 2009;181 (1) :341-345.Abstract
We suggest two extensions of the coalescent effective population size of S
̈ din
et al.
(2005) and make a
third, practical point. First, to bolster its relevance to data and allow comparisons between models, the
coalescent effective size should be recast as a kind of mutation effective size. Second, the requirement that
the coalescent effective population size must depend linearly on the actual population size should be lifted.
Third, even if the coalescent effective population size does not exist in the mathematical sense, it may be
difficult to reject Kingman’s coalescent using genetic data.
Wakeley J. Complex speciation of humans and chimpanzees. Nature. 2008;452 :E4-E5.Abstract

Genetic data from two or more species provide information about the process of speciation. In their analysis of DNA from humans, chimpanzees, gorillas, orangutans and macaques (HCGOM), Patterson et al.1 suggest that the apparently short divergence time between humans and chimpanzees on the X chromosome is explained by a massive interspecific hybridization event in the ancestry of these two species. However, Patterson et al.1 do not statistically test their own null model of simple speciation before concluding that speciation was complex, and—even if the null model could be rejected—they do not consider other explanations of a short divergence time on the X chromosome. These include natural selection on the X chromosome in the common ancestor of humans and chimpanzees, changes in the ratio of male-to-female mutation rates over time, and less extreme versions of divergence with gene flow (see ref. 2, for example). I therefore believe that their claim of hybridization is unwarranted.

Wakeley J. Conditional gene genealogies under strong purifying selection. Mol. Biol. Evol. 2008;25 (12) :2615-2626.Abstract

The ancestral selection graph, conditioned on the allelic types in the sample, is used to obtain a limiting gene genealogical process under strong selection. In an equilibrium, two-allele system with strong selection, neutral gene genealogies are predicted for random samples and for samples containing at most one unfavorable allele. Samples containing more than one unfavorable allele have gene genealogies that differ greatly from neutral predictions. However, they are related to neutral gene genealogies via the well-known Ewens sampling formula. Simulations show rapid convergence to limiting analytical predictions as the strength of selection increases. These results extend the idea of a soft selective sweep to deleterious alleles and have implications for the interpretation of polymorphism among disease- causing alleles in humans. 

Jones D, Wakeley J. The influence of gene conversion on linkage disequilibrium around a selective sweep. Genetics. 2008;180 (2) :1251-1259.Abstract

In a 2007 article, McVean studied the effect of recombination on linkage disequilibrium (LD) between

two neutral loci located near a third locus that has undergone a selective sweep. The results demonstrated that two loci on the same side of a selected locus might show substantial LD, whereas the expected LD for two loci on opposite sides of a selected locus is zero. In this article, we extend McVean’s model to include gene conversion. We show that one of the conclusions is strongly affected by gene conversion: when gene conversion is present, there may be substantial LD between two loci on opposite sides of a selective sweep. 

Eldon B, Wakeley J. Linkage disquilibrium under skewed offspring distribution among individuals in a population. Genetics. 2008;178 (3) :1517-1532.Abstract

Correlations in coalescence times between two loci are derived under selectively neutral population models in which the offspring of an individual can number on the order of the population size. The correlations depend on the rates of recombination and random drift and are shown to be functions of the parameters controlling the size and frequency of these large reproduction events. Since a prediction of linkage disequilibrium can be written in terms of correlations in coalescence times, it follows that the prediction of linkage disequilibrium is a function not only of the rate of recombination but also of the reproduction parameters. Low linkage disequilibrium is predicted if the offspring of a single individual frequently replace almost the entire population. However, high linkage disequilibrium can be predicted if the offspring of a single individual replace an intermediate fraction of the population. In some cases the model reproduces the standard Wright–Fisher predictions. Contrary to common intuition, high linkage disequilibrium can be predicted despite frequent recombination, and low linkage disequilibrium under infrequent recombination. Simulations support the analytical results but show that the variance of linkage disequilibrium is very large.

Ramachandran S, Rosenberg NA, Feldman MW, Wakeley J. Population differentiation and migration: Coalescence times in a two-sex island model for autosomal and X-linked loci. Theoret. Pop. Biol. 2008;74 :291-301.Abstract

Evolutionists have debated whether population-genetic parameters, such as effective population size and migration rate, differ between males and females. In humans, most analyses of this problem have focused on the Y chromosome and the mitochondrial genome, while the X chromosome has largely been omitted from the discussion. Past studies have compared F(ST) values for the Y chromosome and mitochondrion under a model with migration rates that differ between the sexes but with equal male and female population sizes. In this study we investigate rates of coalescence for X-linked and autosomal lineages in an island model with different population sizes and migration rates for males and females, obtaining the mean time to coalescence for pairs of lineages from the same deme and for pairs of lineages from different demes. We apply our results to microsatellite data from the Human Genome Diversity Panel, and we examine the male and female migration rates implied by observed F(ST) values.

Jones D, Wakeley J. Recombination, gene conversion, and identity by descent at three loci. Theoret. Pop. Biol. 2008;73 (2) :264-276.Abstract

We investigate the probabilities of identity-by-descent at three loci in order to find a signature which differentiates between the two types of crossing over events: recombination and gene conversion. We use a Markov chain to model coalescence, recombination, gene conversion and mutation in a sample of size two. Using numerical analysis, we calculate the total probability of identity-by-descent at the three loci, and partition these probabilities based on a partial ordering of coalescent events at the three loci. We use these results to compute the probabilities of four different patterns of conditional identity and non-identity at the three loci under recombination and gene conversion. Although recombination and gene conversion do make different predictions, the differences are not likely to be useful in distinguishing between them using three locus patterns between pairs of DNA sequences. This implies that measures of genetic identity in larger samples will be needed to distinguish between gene conversion and recombination.

Sargsyan O, Wakeley J. A coalescent process with simultaneous multiple mergers for approximating the gene genealogies of many marine organisms. Theoret. Pop. Biol. 2008;74 :104-114.Abstract

We describe a forward-time haploid reproduction model with a constant population size that includes life history characteristics common to many marine organisms. We develop coalescent approximations for sample gene genealogies under this model and use these to predict patterns of genetic variation. Depending on the behavior of the underlying parameters of the model, the approximations are coalescent processes with simultaneous multiple mergers or Kingman's coalescent. Using simulations, we apply our model to data from the Pacific oyster and show that our model predicts the observed data very well. We also show that a fact which holds for Kingman's coalescent and also for general coalescent trees--that the most-frequent allele at a biallelic locus is likely to be the ancestral allele--is not true for our model. Our work suggests that the power to detect a "sweepstakes effect" in a sample of DNA sequences from marine organisms depends on the sample size.

Eldon B, Wakeley J. Coalescent processes when the distribution of offspring number among individuals is highly skewed. Genetics. 2006;172 (1) :701-708.Abstract

We report a complex set of scaling relationships between mutation and reproduction in a simple model of a population. These follow from a consideration of patterns of genetic diversity in a sample of DNA sequences. Five different possible limit processes, each with a different scaled mutation parameter, can be used to describe genetic diversity in a large population. Only one of these corresponds to the usual population genetic model, and the others make drastically different predictions about genetic diversity. The complexity arises because individuals can potentially have very many offspring. To the extent that this occurs in a given species, our results imply that inferences from genetic data made under the usual assumptions are likely to be wrong. Our results also uncover a fundamental difference between pop- ulations in which generations are overlapping and those in which generations are discrete. We choose one of the five limit processes that appears to be appropriate for some marine organisms and use a sample of genetic data from a population of Pacific oysters to infer the parameters of the model. The data suggest the presence of rare reproduction events in which ~8% of the population is replaced by the offspring of a single individual. 

Jesus FF, Wilkins JF, Solferini VN, Wakeley J. Expected coalescence times and segregating sites in a model of glacial cycles. Genetics and Molecular Research. 2006;5 (3) :466-474.Abstract
The climatic fluctuations of the Quaternary have influenced the distribution of numerous plant and animal species. Several species suffer population reduction and fragmentation, becoming restricted to refugia during glacial periods and expanding again during interglacials. The reduction in population size may reduce the effective population size, mean coalescence time and genetic variation, whereas an increased subdivision may have the opposite effect. To investigate these two opposing forces, we proposed a model in which a panmictic and a structured phase alternate, corresponding to interglacial and glacial periods. From this model, we derived an expression for the expected coalesence time and number of segregating sites for a pair of genes. We observed that increasing the number of demes or the duration of the structured phases causes an increase in coalescence time and expected levels of genetic variation. We compared numerical results with the ones expected for a panmictic population of constant size, and showed thathe mean number of segregating sites can be greater in our model even when population size is much smaller in the structured phases. This points to the importance of population structure in the history of species subject to climatic fluctuations, and helps explain the long gene genealogies observed in several organisms.