We describe an iterated game between two players, in which the payoff is to survive a number of steps. Expected payoffs are probabilities of survival. A key feature of the game is that individuals have to survive on their own if their partner dies. We consider individuals with hardwired, unconditional behaviors or strategies. When both players are present, each step is a symmetric two-player game. The overall survival of the two individuals forms a Markov chain. As the number of iterations tends to infinity, all probabilities of survival decrease to zero. We obtain general, analytical results for n-step payoffs and use these to describe how the game changes as n increases. In order to predict changes in the frequency of a cooperative strategy over time, we embed the survival game in three different models of a large, well-mixed population. Two of these models are deterministic and one is stochastic. Offspring receive their parent’s type without modification and fitnesses are determined by the game. Increasing the number of iterations changes the prospects for cooperation. All models become neutral in the limit (n → ∞). Further, if pairs of cooperative individuals survive together with high probability, specifically higher than for any other pair and for either type when it is alone, then cooperation becomes favored if the number of iterations is large enough. This holds regardless of the structure of pairwise interactions in a single step. Even if the single-step interaction is a Prisoner’s Dilemma, the cooperative type becomes favored. Enhanced survival is crucial in these iterated evolutionary games: if players in pairs start the game with a fitness deficit relative to lone individuals, the prospects for cooperation can become even worse than in the case of a single-step game.

%B Theoretical Population Biology %V 125 %P 38–55 %G eng %0 Journal Article %J Theoretical Population Biology %D 2018 %T A non-zero variance of Tajima’s estimator for two sequences even for infinitely many unlinked loci %A King, Léandra %A Wakeley, John %A Carmi, Shai %X The population-scaled mutation rate, θ, is informative on the effective population size and is thus widely used in population genetics. We show that for two sequences and n unlinked loci, the variance of Tajima’s estimator (ˆθ), which is the average number of pairwise differences, does not vanish even as n → ∞. The non-zero variance of ˆθ results from a (weak) correlation between coalescence times even at unlinked loci, which, in turn, is due to the underlying fixed pedigree shared by gene genealogies at all loci. We derive the correlation coefficient under a diploid, discrete-time, Wright–Fisher model, and we also derive a simple, closed-form lower bound. We also obtain empirical estimates of the correlation of coalescence times under demographic models inspired by large-scale human genealogies. While the effect we describe is small (Var [ˆθ]/θ2 ≈ O(N−1e)), it is important to recognize this feature of statistical population genetics, which runs counter to commonly held notions about unlinked loci. %B Theoretical Population Biology %V 122 %P 22-29 %G eng %0 Journal Article %J Theoretical Population Biology %D 2018 %T Public goods games in populations with fluctuating size %A McAvoy, Alex %A Fraiman, Nicloas %A Hauert, Christoph %A Wakeley, John %A Nowak, Martin A. %X Many mathematical frameworks of evolutionary game dynamics assume that the total population size is constant and that selection affects only the relative frequency of strategies. Here,we consider evolutionary game dynamics in an extended Wright–Fisher process with variable population size. In such a scenario, it is possible that the entire population becomes extinct. Survival of the population may depend on which strategy prevails in the game dynamics. Studying cooperative dilemmas, it is a natural feature of such a model that cooperators enable survival, while defectors drive extinction. Although defectors are favored for any mixed population, random drift could lead to their elimination and the resulting pure-cooperator population could survive. On the other hand, if the defectors remain, then the population will quickly go extinct because the frequency of cooperators steadily declines and defectors alone cannot survive. In a mutation–selection model, we find that (i) a steady supply of cooperators can enable long-term population survival, provided selection is sufficiently strong, and (ii) selection can increase the abundance of cooperators but reduce their relative frequency. Thus, evolutionary game dynamics in populations with variable size generate a multifaceted notion of what constitutes a trait’s long-term success. %B Theoretical Population Biology %V 121 %P 72-84 %G eng %0 Journal Article %J Theoretical Population Biology %D 2017 %T Population structure and coalescence in pedigrees: Comparisons to the structured coalescent and a framework for inference %A Wilton, Peter R. %A Pierre Baduel %A Matthieu M. Landon %A Wakeley, John %XContrary to what is often assumed in population genetics, independently segregating loci do not have completely independent ancestries, since all loci are inherited through a single, shared population pedigree. Previous work has shown that the non-independence between gene genealogies of independently segregating loci created by the population pedigree is weak in panmictic populations, and predictions made from standard coalescent theory are accurate for populations that are at least moderately sized. Here, we investigate patterns of coalescence in pedigrees of structured populations. We find that the pedigree creates deviations away from the predictions of the structured coalescent that persist on a longer timescale than in the case of panmictic populations. Nevertheless, we find that the structured coalescent provides a reasonable approximation for the coalescent process in structured population pedigrees so long as migration events are moderately frequent and there are no migration events in the recent pedigree of the sample. When there are migration events in the recent sample pedigree, we find that distributions of coalescence in the sample can be modeled as a mixture of distributions from different initial sample configurations. We use this observation to motivate a maximum-likelihood approach for inferring migration rates and mutation rates jointly with features of the pedigree such as recent migrant ancestry and recent relatedness. Using simulation, we show that our inference framework accurately recovers long-term migration rates in the presence of recent migration events in the sample pedigree.

%B Theoretical Population Biology %V 115 %P 1-12 %G eng %0 Journal Article %J Genetics %D 2016 %T Empirical Bayes estimation of coalescence times from nucleotide sequence data %A King, Leandra %A Wakeley, John %XWe demonstrate the advantages of using information at many unlinked loci to better calibrate estimates of the time to the most recent common ancestor (TMRCA) at a given locus. To this end, we apply a simple empirical Bayes method to estimate the TMRCA. This method is both asymptotically optimal, in the sense that the estimator converges to the true value when the number of unlinked loci for which we have information is large, and has the advantage of not making any assumptions about demographic history. The algorithm works as follows: we first split the sample at each locus into inferred left and right clades to obtain many estimates of the TMRCA, which we can average to obtain an initial estimate of the TMRCA. We then use nucleotide sequence data from other unlinked loci to form an empirical distribution that we can use to improve this initial estimate.

%B Genetics %V 204 %P 249-257 %8 September 2016 %G eng %0 Journal Article %J Genetics %D 2016 %T Taking exception to human eugenics %A Roth, Frederick P. %A Wakeley, John %B Genetics %V 204 %P 821-823 %8 October 2016 %G eng %0 Journal Article %J PNAS %D 2016 %T Effects of the population pedigree on genetic signatures of historical demographic events %A Wakeley, J. %A L. King %A P. R. Wilton %XGenetic variation among loci in the genomes of diploid biparental organisms is the result of mutation and genetic transmission through the genealogy, or population pedigree, of the species. We explore the consequences of this for patterns of variation at unlinked loci for two kinds of demographic events: the occurrence of a very large family or a strong selective sweep that occurred in the recent past. The results indicate that only rather extreme versions of such events can be expected to structure population pedigrees in such a way that unlinked loci will show deviations from the standard predictions of population genetics, which average over population pedigrees. The results also suggest that large samples of individuals and loci increase the chance of picking up signatures of these events, and that very large families may have a unique signature in terms of sample distributions of mutant alleles.

%B PNAS %V 113 %P 7994-8001 %G eng %N 29 %0 Journal Article %J Am. J. Hum. Genet. %D 2015 %T Leveraging distant relatedness to quantify human mutation and gene-conversion rates %A Palamara, Pier Francesco %A Francioli, Laurent C. %A Wilton, Peter R. %A Genovese, Giulio %A Gusev, Alexander %A Finucane, Hilary K. %A Sankararaman, Sriram %A Sunyaev, Shamil R. %A de Bakker, Paul I. W. %A Wakeley, John %A Pe'er, Itsik %A Price, Alkes L. %A Genome Netherlands Consortium %XThe rate at which human genomes mutate is a central biological parameter that has many implications for our ability to understand demographic and evolutionary phenomena. We present a method for inferring mutation and gene-conversion rates by using the number of sequence differences observed in identical-by-descent (IBD) segments together with a reconstructed model of recent population-size history. This approach is robust to, and can quantify, the presence of substantial genotyping error, as validated in coalescent simulations. We applied the method to 498 trio-phased sequenced Dutch individuals and inferred a point mutation rate of 1.66 x 10(-8) per base per generation and a rate of 1.26 x 10(-9) for <20 bp indels. By quantifying how estimates varied as a function of allele frequency, we inferred the probability that a site is involved in non-crossover gene conversion as 5.99 x 10(-6). We found that recombination does not have observable mutagenic effects after gene conversion is accounted for and that local gene-conversion rates reflect recombination rates. We detected a strong enrichment of recent deleterious variation among mismatching variants found within IBD regions and observed summary statistics of local sharing of IBD segments to closely match previously proposed metrics of background selection; however, we found no significant effects of selection on our mutation-rate estimates. We detected no evidence of strong variation of mutation rates in a number of genomic annotations obtained from several recent studies. Our analysis suggests that a mutation-rate estimate higher than that reported by recent pedigree-based studies should be adopted in the context of DNA-based demographic reconstruction.

%B Am. J. Hum. Genet. %I CELL PRESS %C 600 TECHNOLOGY SQUARE, 5TH FLOOR, CAMBRIDGE, MA 02139 USA %V 97 %P 775-789 %8 DEC %G eng %N 6 %9 Article %R 10.1016/j.ajhg.2015.10.006 %0 Journal Article %J Genetics %D 2015 %T Bayesian nonparametric inference of population size changes from sequential genealogies %A Palacios, JA %A J. Wakeley %A S. Ramachandran %XSophisticated inferential tools coupled with the coalescent model have recently emerged for estimating past population sizes from genomic data. Recent methods that model recombination require small sample sizes, make constraining assumptions about population size changes, and do not report measures of uncertainty for estimates. Here, we develop a Gaussian process-based Bayesian nonparametric method coupled with a sequentially Markov coalescent model that allows accurate inference of population sizes over time from a set of genealogies. In contrast to current methods, our approach considers a broad class of recombination events, including those that do not change local genealogies. We show that our method outperforms recent likelihood-based methods that rely on discretization of the parameter space. We illustrate the application of our method to multiple demographic histories, including population bottlenecks and exponential growth. In simulation, our Bayesian approach produces point estimates four times more accurate than maximum-likelihood estimation (based on the sum of absolute differences between the truth and the estimated values). Further, our method's credible intervals for population size as a function of time cover 90% of true values across multiple demographic scenarios, enabling formal hypothesis testing about population size differences over time. Using genealogies estimated with *ARG*weaver, we apply our method to European and Yoruban samples from the 1000 Genomes Project and confirm key known aspects of population size history over the past 150,000 years.

A long genomic segment inherited by a pair of individuals from a single, recent common ancestor is said to be *identical-by-descent* (IBD). Shared IBD segments have numerous applications in genetics, from demographic inference to phasing, imputation, pedigree reconstruction, and disease mapping. Here, we provide a theoretical analysis of IBD sharing under Markovian approximations of the coalescent with recombination. We describe a general framework for the IBD process along the chromosome under the Markovian models (SMC/SMC’), as well as introduce and justify a new model, which we term the *renewal approximation*, under which lengths of successive segments are independent. Then, considering the infinite-chromosome limit of the IBD process, we recover previous results (for SMC) and derive new results (for SMC’) for the mean number of shared segments longer than a cutoff and the fraction of the chromosome found in such segments. We then use renewal theory to derive an expression (in Laplace space) for the distribution of the number of shared segments and demonstrate implications for demographic inference. We also compute (again, in Laplace space) the distribution of the fraction of the chromosome in shared segments, from which we obtain explicit expressions for the first two moments. Finally, we generalize all results to populations with a variable effective size.

The evolution of drug resistance in HIV occurs by the fixation of specific, well-known, drug-resistance mutations, but the underlying population genetic processes are not well understood. By analyzing within-patient longitudinal sequence data, we make four observations that shed a light on the underlying processes and allow us to infer the short-term effective population size of the viral population in a patient. Our first observation is that the evolution of drug resistance usually occurs by the fixation of one drug-resistance mutation at a time, as opposed to several changes simultaneously. Second, we find that these fixation events are accompanied by a reduction in genetic diversity in the region surrounding the fixed drug resistance mutation, due to the hitchhiking effect. Third, we observe that the fixation of drug-resistance mutations involves both hard and soft selective sweeps. In a hard sweep, a resistance mutation arises in a single viral particle and drives all linked mutations with it when it spreads in the viral population, which dramatically reduces genetic diversity. On the other hand, in a soft sweep, a resistance mutation occurs multiple times on different genetic backgrounds, and the reduction of diversity is weak. Using the frequency of occurrence of hard and soft sweeps we estimate the effective population size of HIV to be 1:5|105 (95% confidence interval ½0:8|105,4:8|105). This number is much lower than the actual number of infected cells, but much larger than previous population size estimates based on synonymous diversity. We propose several explanations for the observed discrepancies. Finally, our fourth observation is that genetic diversity at non-synonymous sites recovers to its pre-fixation value within 18 months, whereas diversity at synonymous sites remains depressed after this time period. These results improve our understanding of HIV evolution and have potential implications for treatment

strategies.

Citation: Pennings PS, Kryazhimskiy S, Wakeley J (2014) Loss and Recovery of Genetic Diversity in Adapting Populations of HIV. PLoS Genet 10(1): e1004000.

doi:10.1371/journal.pgen.1004000

Editor: Christophe Fraser, Imperial College London, United Kingdom

Received April 19, 2013; Accepted October 19, 2013; Published January 23, 2014

Copyright: 2014 Pennings et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits

unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: SK was supported by a Career Award at Scientific Interface from the Burroughs Wellcome Fund (http://www.bwfund.org/). PSP was supported by a

long-term postdoctoral fellowship of the Human Frontier Science Program (http://www.hfsp.org/). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing Interests: The authors have declared that no competing interests exist.

* E-mail: pleuni@stanford.edu

%B PLoS Genet %V 10(1): e1004000 %G eng %0 Journal Article %J Theoret. Pop. Biol. %D 2013 %T Coalescent theory has many new branches %A J. Wakeley %B Theoret. Pop. Biol. %V 87 %P 1-4 %G eng %0 Journal Article %J Genetics %D 2012 %T Extending coalescent theory to autotetraploids %A Arnold, B. %A K. Bomblies %A J. Wakeley %XWe develop coalescent models for autotetraploid species with tetrasomic inheritance. We show that the ancestral genetic process in a large population without recombination may be approximated using Kingman’s standard coalescent, with a coalescent effective population size 4*N*. Numerical results suggest that this approximation is accurate for population sizes on the order of hundreds of individuals. Therefore, existing coalescent simulation programs can be adapted to study population history in autotetraploids simply by interpreting the timescale in units of 4*N* generations. We also consider the possibility of double reduction, a phenomenon unique to polysomic inheritance, and show that its effects on gene genealogies are similar to partial self-fertilization.

We address a conceptual flaw in the backward-time approach to population genetics called coalescent theory as it is applied to diploid biparental organisms. Specifically, the way random models of reproduction are used in coalescent theory is not justified. Instead, the population pedigree for diploid organisms--that is, the set of all family relationships among members of the population--although unknown, should be treated as a fixed parameter, not as a random quantity. Gene genealogical models should describe the outcome of the percolation of genetic lineages through the population pedigree according to Mendelian inheritance. Using simulated pedigrees, some of which are based on family data from 19th century Sweden, we show that in many cases the (conceptually wrong) standard coalescent model is difficult to reject statistically and in this sense may provide a surprisingly accurate description of gene genealogies on a fixed pedigree. We study the differences between the fixed-pedigree coalescent and the standard coalescent by analysis and simulations. Differences are apparent in recent past, within ≈ <log(2)(N) generations, but then disappear as genetic lineages are traced into the more distant past.

%B Genetics %V 190 %P 1433-1445 %G eng %N 4 %0 Journal Article %J Mol. Biol. Evol. %D 2011 %T Recovering population parameters from a single gene genealogy: an unbiased estimator of the growth rate %A Y. E. Maruvka %A N. M. Shnerb %A Y. Bar-Yam %A J. Wakeley %XWe show that the number of lineagesancestral to a sample, as a function of time back into the past, which we call the number

of lineages as a function of time (NLFT), is a nearly deterministic property of large-sample gene genealogies. We obtain analytic

expressionsfor the NLFT for both constant-sizedand exponentiallygrowing populations.The low level of stochastic variation

associated with the NLFT of a large sample suggests using the NLFT to make estimates of population parameters. Based on

this, we develop a new computational method of inferring the size and growth rate of a population from a large sample of

DNA sequences at a single locus. We apply our method first to a sample of 1,212 mitochondrial DNA (mtDNA) sequences

from China, confirming a pattern of recent population growth previously identified using other techniques, but with much

smaller confidence intervals for past population sizes due to

the low variation of the NLFT. We further analyze a set of 63

mtDNA sequences from blue whales (BWs), concluding that the population grew in the past. This calls for reevaluation of

previous studies that were based on the assumption that the BW population was fixed.

%B Mol. Biol. Evol.
%V 28
%P 1617-1631
%G eng
%N 5
%0 Journal Article
%J Evolution
%D 2010
%T Genome structure and the benefit of sex
%A Watson, R.A.
%A Weinreich, D.
%A J. Wakeley
%X We examine the behavior of sexual and asexual populations in modular multipeaked fitness landscapes and show that sexuals

can systematically reach different, higher fitness adaptive peaks than asexuals. Whereas asexuals must move against selection

to escape local optima, sexuals reach higher fitness peaks reliably because they create specific genetic variants that “skip over”

fitness valleys, moving from peak to peak in the fitness landscape. This occurs because recombination can supply combinations

of mutations in functional composites or “modules,” that may include individually deleterious mutations. Thus when a beneficial

module is substituted for another less-fit module by sexual recombination it provides a genetic variant that would require either

several specific simultaneous mutations in an asexual population or a sequence of individual mutations some of which would be

selected against. This effect requires modular genomes, such that subsets of strongly epistatic mutations are tightly physically

linked. We argue that such a structure is provided simply by virtue of the fact that genomes contain many genes each containing

many strongly epistatic nucleotides. We briefly discuss the connections with “building blocks” in the evolutionary computation

literature. We conclude that there are conditions in which sexuals can systematically evolve high-fitness genotypes that are

essentially unevolvable for asexuals.

%B Evolution
%V 65-28
%P 523-536
%G eng
%0 Journal Article
%J Theoret. Pop. Biol.
%D 2010
%T Sufficiency of the number of segregating sites in the limit under finite-sites mutation
%A RoyChoudhury, A.
%A Wakeley, J.
%X We show that the number of segregating sites is a sufficient statistic for the scaled mutation parameter in the limit as the number of sites tends to infinity and there is free recombination between sites. We assume that the mutation parameter at each site tends to zero such than the total mutation parameter is constant in the limit. Our results show that Watterson’s estimator is the maximum likelihood estimator in this case, but that it estimates a composite parameter which is different for different mutation models. Some of our results hold when recombination is limited, because Watterson’s estimator is an unbiased, method-of-moments estimator regardless of the recombination rate. The quantity it estimates depends on the details of how mutations occur at each site.

%B Theoret. Pop. Biol.
%V 78
%P 118-122
%G eng
%N 2
%0 Journal Article
%J Molecular Biology and Evolution
%D 2010
%T Measuring the sensitivity of single-locus "neutrality tests" using a direct perturbation approach
%A D Garrigan
%A R Lewontin
%A J. Wakeley
%K Evolution
%K Genetics
%K Techniques
%X A large number of statistical tests have been proposed to detect natural selection based on a sample of variation at a single genetic locus. These tests measure the deviation of the allelic frequency distribution observed within populations from the distribution expected under a set of assumptions that includes both neutral evolution and equilibrium population demography. The present study considers a new way to assess the statistical properties of these tests of selection, by their behavior in response to direct perturbations of the steady-state allelic frequency distribution, unconstrained by any particular nonequilibrium demographic scenario. Results from Monte Carlo computer simulations indicate that most tests of selection are more sensitive to perturbations of the allele frequency distribution that increase the variance in allele frequencies than to perturbations that decrease the variance. Simulations also demonstrate that it requires, on average, 4N generations (N is the diploid effective population size) for tests of selection to relax to their theoretical, steady-state distributions following different perturbations of the allele frequency distribution to its extremes. This relatively long relaxation time highlights the fact that these tests are not robust to violations of the other assumptions of the null model besides neutrality. Lastly, genetic variation arising under an example of a regularly cycling demographic scenario is simulated. Tests of selection performed on this last set of simulated data confirm the confounding nature of these tests for the inference of natural selection, under a demographic scenario that likely holds for many species. The utility of using empirical, genomic distributions of test statistics, instead of the theoretical steady-state distribution, is discussed as an alternative for improving the statistical inference of natural selection.

%B Molecular Biology and Evolution %V 27 %P 73-89 %G English %N 1 %M Zoorec:Zoor14606038610 %0 Journal Article %J PLoS ONE %D 2010 %T Pacific salmon and the coalescent effective population size %A C Cenik %A Wakeley, J. %K Chordates %K Conservation %K Ecology %K Evolution %K Fish %K Genetics %K Life cycle and development %K Population dynamics %K Techniques %K Vertebrates %XPacific salmon include several species that are both commercially important and endangered. Understanding the causes of loss in genetic variation is essential for designing better conservation strategies. Here we use a coalescent approach to analyze a model of the complex life history of salmon, and derive the coalescent effective population (CES). With the aid of Kronecker products and a convergence theorem for Markov chains with two time scales, we derive a simple formula for the CES and thereby establish its existence. Our results may be used to address important questions regarding salmon biology, in particular about the loss of genetic variation. To illustrate the utility of our approach, we consider the effects of fluctuations in population size over time. Our analysis enables the application of several tools of coalescent theory to the case of salmon.

%B PLoS ONE %V 5 %P e13019, 1-10 %G English %N 9 %M Zoorec:Zoor14703019944 %0 Journal Article %J Evolution %D 2010 %T A structured coalescent process for seasonally fluctuating populations %A M Shpak %A J. Wakeley %A D Garrigan %A Lewontin, R. C. %K Arthropods %K Behaviour %K Ecology %K Genetic techniques %K Genetics %K Insects %K Invertebrates %K Land zones %K Nearctic region %K North America %K Techniques %K True Flies %XMany short-lived organisms pass through several generations during favorable growing seasons, separated by inhospitable periods during which only small hibernating or estivating refugia remain. This induces pronounced seasonal fluctuations in population size and metapopulation structure. The first generations in the growing season will be characterized by small, relatively isolated demes whereas the later generations will experience larger deme sizes with more extensive gene flow. Fluctuations of this sort can induce changes in the amount of genetic variation in early season samples compared to late season samples, a classical example being the observations of seasonal variation in allelism in New England Drosophila populations by PT. Ives. In this article, we study the properties of a structured coalescent process under seasonal fluctuations using numerical analysis of exact state equations, analytical approximations that rely on a separation of timescales between intrademic versus interdemic processes, and individual-based simulations. We show that although an increase in genetic variation during each favorable growing season is observed, it is not as pronounced as in the empirical observations This suggests that some of the temporal patterns of variation seen by Ives may be due to selection against deleterious lethals rather than neutral processes.

%B Evolution %V 64 %P 1395-1409 %G English %N 5 %M Zoorec:Zoor14610066503 %0 Journal Article %J Genetics %D 2009 %T Modeling multiallelic selection using a Moran model. %A Muirhead, C. %A Wakeley, J. %XWe present a Moran-model approach to modeling general multiallelic selection in a finite population

and show how it may be used to develop theoretical models of biological systems of balancing selection such as plant gametophytic self-incompatibility loci. We propose new expressions for the stationary distribution of allele frequencies under selection and use them to show that the continuous-time Markov chain describing allele frequency change with exchangeable selection and Moran-model reproduction is reversible. We then use the reversibility property to derive the expected allele frequency spectrum in a finite population for several general models of multiallelic selection. Using simulations, we show that our approach is valid over a broader range of parameters than previous analyses of balancing selection based on diffusion approximations to the Wright–Fisher model of reproduction. Our results can be applied to any model of multiallelic selection in which fitness is solely a function of allele frequency.

Estimates of gene flow between subpopulations based on

F

ST

(or

N

ST

) are shown to be confounded by

the reproduction parameters of a model of skewed offspring distribution. Genetic evidence of population

subdivision can be observed even when gene flow is very high, if the offspring distribution is skewed. A

skewed offspring distribution arises when individuals can have very many offspring with some probability.

This leads to high probability of identity by descent within subpopulations and results in genetic

heterogeneity between subpopulations even when

Nm

is very large. Thus, we consider a limiting model in

which the rates of coalescence and migration can be much higher than for a Wright–Fisher population.

We derive the densities of pairwise coalescence times and expressions for

F

ST

and other statistics under

both the finite island model and a many-demes limit model. The results can explain the observed genetic

heterogeneity among subpopulations of certain marine organisms despite substantial gene flow

%B Genetics %V 181 %P 615-629 %G eng %N 2 %0 Journal Article %J Theoret. Pop. Biol. %D 2009 %T The conditional ancestral selection graph with strong balancing selection. %A J. Wakeley %A Sargsyan, O. %XUsing a heuristic separation-of-time-scales argument, we describe the behavior of the conditional ancestral selection graph with very strong balancing selection between a pair of alleles. In the limit as the strength of selection tends to infinity, we find that the ancestral process converges to a neutral structured coalescent, with two subpopulations representing the two alleles and mutation playing the role of migration. This agrees with a previous result of Kaplan et al., obtained using a different approach. We present the results of computer simulations to support our heuristic mathematical results. We also present a more rigorous demonstration that the neutral conditional ancestral process converges to the Kingman coalescent in the limit as the mutation rate tends to infinity.

%B Theoret. Pop. Biol. %V 75 %P 355-364 %G eng %N 4 %0 Journal Article %J Proc. Natl. Acad. Sci., USA %D 2009 %T Evolution of cooperation by phenotypic similarity. %A Antal, T. %A Ohtsuki, H. %A J. Wakeley %A Taylor, P.D. %A Nowak, M. %XThe emergence of cooperation in populations of selfish individu-

als is a fascinating topic that has inspired much work in theoretical

biology. Here, we study the evolution of cooperation in a model

where individuals are characterized by phenotypic properties that

are visible to others. The population is well mixed in the sense

that everyone is equally likely to interact with everyone else, but

the behavioral strategies can depend on distance in phenotype

space. We study the interaction of cooperators and defectors. In

our model, cooperators cooperate with those who are similar and

defect otherwise. Defectors always defect. Individuals mutate to

nearby phenotypes, which generates a random walk of the popu-

lation in phenotype space. Our analysis brings together ideas from

coalescence theory and evolutionary game dynamics. We obtain

a precise condition for natural selection to favor cooperators over

defectors. Cooperation is favored when the phenotypic mutation

rate is large and the strategy mutation rate is small. In the optimal

case for cooperators, in a one-dimensional phenotype space and

for large population size, the critical benefit-to-cost ratio is given

by

b

/

c

=

1

+

2

/

√

3. We also derive the fundamental condition for

any two-strategy symmetric game and consider high-dimensional

phenotype spaces.

%B Proc. Natl. Acad. Sci., USA
%V 106
%P 8597-8600
%G eng
%0 Journal Article
%J Genetics
%D 2009
%T Extensions of the coalescent effective population size.
%A J. Wakeley
%A Sargsyan, O.
%X We suggest two extensions of the coalescent effective population size of S

jo

̈ din

et al.

(2005) and make a

third, practical point. First, to bolster its relevance to data and allow comparisons between models, the

coalescent effective size should be recast as a kind of mutation effective size. Second, the requirement that

the coalescent effective population size must depend linearly on the actual population size should be lifted.

Third, even if the coalescent effective population size does not exist in the mathematical sense, it may be

difficult to reject Kingman’s coalescent using genetic data.

%B Genetics
%V 181
%P 341-345
%G eng
%N 1
%0 Journal Article
%J Nature
%D 2008
%T Complex speciation of humans and chimpanzees.
%A Wakeley, J.
%X Genetic data from two or more species provide information about the process of speciation. In their analysis of DNA from humans, chimpanzees, gorillas, orangutans and macaques (HCGOM), Patterson *et al.*^{1} suggest that the apparently short divergence time between humans and chimpanzees on the X chromosome is explained by a massive interspecific hybridization event in the ancestry of these two species. However, Patterson *et al.*^{1} do not statistically test their own null model of simple speciation before concluding that speciation was complex, and—even if the null model could be rejected—they do not consider other explanations of a short divergence time on the X chromosome. These include natural selection on the X chromosome in the common ancestor of humans and chimpanzees, changes in the ratio of male-to-female mutation rates over time, and less extreme versions of divergence with gene flow (see ref. 2, for example). I therefore believe that their claim of hybridization is unwarranted.

The ancestral selection graph, conditioned on the allelic types in the sample, is used to obtain a limiting gene genealogical process under strong selection. In an equilibrium, two-allele system with strong selection, neutral gene genealogies are predicted for random samples and for samples containing at most one unfavorable allele. Samples containing more than one unfavorable allele have gene genealogies that differ greatly from neutral predictions. However, they are related to neutral gene genealogies via the well-known Ewens sampling formula. Simulations show rapid convergence to limiting analytical predictions as the strength of selection increases. These results extend the idea of a soft selective sweep to deleterious alleles and have implications for the interpretation of polymorphism among disease- causing alleles in humans.

In a 2007 article, McVean studied the effect of recombination on linkage disequilibrium (LD) between

two neutral loci located near a third locus that has undergone a selective sweep. The results demonstrated that two loci on the same side of a selected locus might show substantial LD, whereas the expected LD for two loci on opposite sides of a selected locus is zero. In this article, we extend McVean’s model to include gene conversion. We show that one of the conclusions is strongly affected by gene conversion: when gene conversion is present, there may be substantial LD between two loci on opposite sides of a selective sweep.

Correlations in coalescence times between two loci are derived under selectively neutral population models in which the offspring of an individual can number on the order of the population size. The correlations depend on the rates of recombination and random drift and are shown to be functions of the parameters controlling the size and frequency of these large reproduction events. Since a prediction of linkage disequilibrium can be written in terms of correlations in coalescence times, it follows that the prediction of linkage disequilibrium is a function not only of the rate of recombination but also of the reproduction parameters. Low linkage disequilibrium is predicted if the offspring of a single individual frequently replace almost the entire population. However, high linkage disequilibrium can be predicted if the offspring of a single individual replace an intermediate fraction of the population. In some cases the model reproduces the standard Wright–Fisher predictions. Contrary to common intuition, high linkage disequilibrium can be predicted despite frequent recombination, and low linkage disequilibrium under infrequent recombination. Simulations support the analytical results but show that the variance of linkage disequilibrium is very large.

%B Genetics %V 178 %P 1517-1532 %G eng %N 3 %0 Journal Article %J Theoret. Pop. Biol. %D 2008 %T Population differentiation and migration: Coalescence times in a two-sex island model for autosomal and X-linked loci %A S. Ramachandran %A Rosenberg, N.A. %A Feldman, MW %A Wakeley, J. %XEvolutionists have debated whether population-genetic parameters, such as effective population size and migration rate, differ between males and females. In humans, most analyses of this problem have focused on the Y chromosome and the mitochondrial genome, while the X chromosome has largely been omitted from the discussion. Past studies have compared F(ST) values for the Y chromosome and mitochondrion under a model with migration rates that differ between the sexes but with equal male and female population sizes. In this study we investigate rates of coalescence for X-linked and autosomal lineages in an island model with different population sizes and migration rates for males and females, obtaining the mean time to coalescence for pairs of lineages from the same deme and for pairs of lineages from different demes. We apply our results to microsatellite data from the Human Genome Diversity Panel, and we examine the male and female migration rates implied by observed F(ST) values.

%B Theoret. Pop. Biol. %V 74 %P 291-301 %G eng %0 Journal Article %J Theoret. Pop. Biol. %D 2008 %T Recombination, gene conversion, and identity by descent at three loci %A Jones, D. %A J. Wakeley %XWe investigate the probabilities of identity-by-descent at three loci in order to find a signature which differentiates between the two types of crossing over events: recombination and gene conversion. We use a Markov chain to model coalescence, recombination, gene conversion and mutation in a sample of size two. Using numerical analysis, we calculate the total probability of identity-by-descent at the three loci, and partition these probabilities based on a partial ordering of coalescent events at the three loci. We use these results to compute the probabilities of four different patterns of conditional identity and non-identity at the three loci under recombination and gene conversion. Although recombination and gene conversion do make different predictions, the differences are not likely to be useful in distinguishing between them using three locus patterns between pairs of DNA sequences. This implies that measures of genetic identity in larger samples will be needed to distinguish between gene conversion and recombination.

%B Theoret. Pop. Biol. %V 73 %P 264-276 %G eng %N 2 %0 Journal Article %J Theoret. Pop. Biol. %D 2008 %T A coalescent process with simultaneous multiple mergers for approximating the gene genealogies of many marine organisms %A Sargsyan, O. %A J. Wakeley %K Genetics %K Invertebrates %K Life cycle and development %K Marine zones %K Molluscs %K Pacific Ocean %K Techniques %XWe describe a forward-time haploid reproduction model with a constant population size that includes life history characteristics common to many marine organisms. We develop coalescent approximations for sample gene genealogies under this model and use these to predict patterns of genetic variation. Depending on the behavior of the underlying parameters of the model, the approximations are coalescent processes with simultaneous multiple mergers or Kingman's coalescent. Using simulations, we apply our model to data from the Pacific oyster and show that our model predicts the observed data very well. We also show that a fact which holds for Kingman's coalescent and also for general coalescent trees--that the most-frequent allele at a biallelic locus is likely to be the ancestral allele--is not true for our model. Our work suggests that the power to detect a "sweepstakes effect" in a sample of DNA sequences from marine organisms depends on the sample size.

%B Theoret. Pop. Biol. %V 74 %P 104-114 %G English %M Zoorec:Zoor14412071968 %0 Journal Article %J Genetics %D 2006 %T Coalescent processes when the distribution of offspring number among individuals is highly skewed. %A Eldon, B. %A J. Wakeley %XThe climatic fluctuations of the Quaternary have influenced the distribution of numerous plant and animal species. Several species suffer population reduction and fragmentation, becoming restricted to refugia during glacial periods and expanding again during interglacials. The reduction in population size may reduce the effective population size, mean coalescence time and genetic variation, whereas an increased subdivision may have the opposite effect. To investigate these two opposing forces, we proposed a model in which a panmictic and a structured phase alternate, corresponding to interglacial and glacial periods. From this model, we derived an expression for the expected coalesence time and number of segregating sites for a pair of genes. We observed that increasing the number of demes or the duration of the structured phases causes an increase in coalescence time and expected levels of genetic variation. We compared numerical results with the ones expected for a panmictic population of constant size, and showed thathe mean number of segregating sites can be greater in our model even when population size is much smaller in the structured phases. This points to the importance of population structure in the history of species subject to climatic fluctuations, and helps explain the long gene genealogies observed in several organisms.

%B Genetics and Molecular Research
%V 5
%P 466-474
%G eng
%N 3
%0 Journal Article
%J Genetics
%D 2006
%T Convergence to the island-model coalescent process in populations with restricted migration
%A Matsen, FA
%A J. Wakeley
%K Behaviour
%K Genetic techniques
%K Genetics
%K Techniques
%X In this article we apply some graph-theoretic results to the study of coalescence in a structured population with migration. The graph is the pattern of migration among subpopulations, or demes, and we use the theory of random walks on graphs to characterize the ease with which ancestral lineages can traverse the habitat in a series of migration events. We identify conditions under which the coalescent process in populations with restricted migration, such that individuals cannot traverse the habitat freely in a single migration event, nonetheless becomes identical to the coalescent process in the island migration model in the limit as the number of demes tends to infinity. Specifically, we first note that a sequence of symmetric graphs with Diaconis-Stroock constant bounded above has an unstructured Kingman-type coalescent in the limit for a sample of size two from two different demes. We then show that circular and toroidal models with long-range but restricted migration have an upper bound on this constant and so have an unstructured-migration coalescent in the limit. We investigate the rate of convergence to this limit using simulations.

%B Genetics %V 172 %P 701-708 %G English %N 1 %M Zoorec:Zoor14206034270 %0 Journal Article %J Theoret. Pop. Biol. %D 2006 %T Corridors for migration between large subdivided populations, and the structured coalescent %A J. Wakeley %A Lessard, S. %K Ecology %K Genetic techniques %K Genetics %K Techniques %XWe study the ancestral genetic process for samples from two large, subdivided populations that are connected by migration to, from, and within a small set of subpopulations, or demes. We consider convergence to an ancestral limit process as the numbers of demes in the two large, subdivided populations tend to infinity. We show that the ancestral limit process for a sample includes a recent instantaneous adjustment to the sample size and structure followed by a more ancient process that is identical to the usual structured coalescent, but with different scaled parameters. This justifies the application of a modified structured coalescent to some hierarchically structured populations.

%B Theoret. Pop. Biol. %V 70 %P 412-420 %G English %N 4 %M Zoorec:Zoor14309059099 %0 Journal Article %J Genetics %D 2005 %T The structured ancestral selection graph and the many-demes limit %A Slade, P. %A Wakeley, J. %XWe show that the unstructured ancestral selection graph applies to part of the history of a sample from population structured by restricted migration among subpopulations, or demes. The result holds in the limit as the number of demes tends to infinity with proportionately weak selection, and we have also made assumptions of island-type migration and that demes are equivalent in size. After an instantaneous sample-size adjustment, this structured ancestral selection graph converges to an unstructured ancestral selection graph with a mutation parameter that depends inversely on the migration rate. In contrast, the selection parameter for the population is independent of the migration rate and is identical to the selection parameter in an unstructured population. We show analytically that estimators of the migration rate, based on pairwise sequence differences, derived under the assumption of neutrality should perform equally well in the presence of weak selection. We also modify an algorithm for simulating genealogies conditional on the frequencies of two selected alleles in a sample. This permits efficient simulation of stronger selection than was previously possible. Using this new algorithm, we simulate gene genealogies under the many-demes ancestral selection graph and identify some situations in which migration has a strong effect on the time to the most recent common ancestor of the sample. We find that a similar effect also increases the sensitivity of the genealogy to selection.

%B Genetics %V 169 %P 1117-1131 %G eng %N 2 %0 Journal Article %J Genetics %D 2005 %T The limits of theoretical population genetics %A Wakeley, John %K Genetics %B Genetics %V 169 %P 1-7 %G English %N 1 %M Zoorec:Zoor14106033682 %0 Journal Article %J J. Hered. %D 2004 %T Recent trends in population genetics: more data! more math! simple models? %A Wakeley, J. %XRecent developments in population genetics are reviewed and placed in a historical context. Current and future challenges, both in computational methodology and in analytical theory, are to develop models and techniques to extract the most information possible from multilocus DNA datasets. As an example of the theoretical issues, five limiting forms of the island model of population subdivision with migration are presented in a unified framework. These approximations illustrate the interplay between migration and drift in structuring gene genealogies, and some of them make connections between the fairly complicated island-model genealogical process and the much simpler, unstructured neutral coalescent process which underlies most inferential techniques in population genetics.

%B J. Hered. %V 95 %P 397-405 %G eng %N 5 %0 Journal Article %J Mol. Biol. Evol. %D 2004 %T A robust measure of HIV-1 population turnover within chronically infected individuals. %A Achaz, G. %A Palmer, S. %A Kearny, M. %A Maldarelli, F. %A Mellors, J.W. %A Coffin, J.M. %A Wakeley, J. %XA simple nonparameteric test for population structure was applied to temporally spaced samples of HIV-1 sequences from the gag-pol region within two chronically infected individuals. The results show that temporal structure can be detected for samples separated by about 22 months or more. The performance of the method, which was originally proposed to detect geographic structure, was tested for temporally spaced samples using neutral coalescent simulations. Simulations showed that the method is robust to variation in samples sizes and mutation rates, to the presence/absence of recombination, and that the power to detect temporal structure is high. By comparing levels of temporal structure in simulations to the levels observed in real data, we estimate the effective intra-individual population size of HIV-1 to be between 10^{3} and 10^{4} viruses, which is in agreement with some previous estimates. Using this estimate and a simple measure of sequence diversity, we estimate an effective neutral mutation rate of about 5 x 10^{-6} per site per generation in the gag-pol region. The definition and interpretation of estimates of such ‘‘effective’’ population parameters are discussed.

We study the ancestral recombination graph for a pair of sites in a geographically

structured population. In particular, we consider the limiting behavior of the graph, under

Wright’s island model, as the number of subpopulations, or demes, goes to infinity. After an

instantaneous sample-size adjustment, the graph becomes identical to the two-locus graph

in an unstructured population, but with a time scale that depends on the migration rate and

the deme size. Interestingly, when migration is gametic, this rescaling of time increases the

population mutation rate but does not affect the population recombination rate. We compare

this to the case of a partially-selfing population, in which both mutation and recombination

depend on the selfing rate. Our result for gametic migration holds both for finite-sized demes,

and in the limit as the deme size goes to infinity. However, when migration occurs during the

diploid phase of the life cycle and demes are finite in size, the population recombination rate

does depend on the migration rate, in a way that is reminiscent of partial selfing. Simulations

imply that convergence to a rescaled panmictic ancestral recombination graph occurs for

any number of sites as the number of demes approaches infinity.

%B J. Math. Biol.
%V 48
%P 275-292
%G eng
%N 3
%0 Journal Article
%J Theoret. Pop. Biol.
%D 2004
%T The many-demes limit for selection and drift in a subdivided population.
%A Wakeley, John
%A Takahashi, Tsuyoshi
%K Evolution
%K Genetics
%K Population genetics
%X A diffusion approximation is obtained for the frequency of a selected allele in a population comprised of many subpopulations or demes. The form of the diffusion is equivalent to that for an unstructured population, except that it occurs on a longer time scale when migration among demes is restricted. This many-demes diffusion limit relies on the collection of demes always being in statistical equilibrium with respect to migration and drift for a given allele frequency in the total population. Selection is assumed to be weak, in inverse proportion to the number of demes, and the results hold for any deme sizes and migration rates greater than zero. The distribution of allele frequencies among denies is also described. [copyright] 2004 Elsevier Inc. All rights reserved.

%B Theoret. Pop. Biol. %V 66 %P 83-91 %G English %N 2 %M Zoorec:Zoor14012068935 %0 Journal Article %J Mol. Ecol. %D 2004 %T Metapopulation models for historical inference. %A Wakeley, John %K Genetic techniques %K Techniques %XThe genealogical process for a sample from a metapopulation, in which local populations are connected by migration and can undergo extinction and subsequent recolonization, is shown to have a relatively simple structure in the limit as the number of populations in the metapopulation approaches infinity. The result, which is an approximation to the ancestral behaviour of samples from a metapopulation with a large number of populations, is the same as that previously described for other metapopulation models, namely that the genealogical process is closely related to Kingman's unstructured coalescent. The present work considers a more general class of models that includes two kinds of extinction and recolonization, and the possibility that gamete production precedes extinction. In addition, following other recent work, this result for a metapopulation divided into many populations is shown to hold both for finite population sizes and in the usual diffusion limit, which assumes that population sizes are large. Examples illustrate when the usual diffusion limit is appropriate and when it is not. Some shortcomings and extensions of the model are considered, and the relevance of such models to understanding human history is discussed.

%B Mol. Ecol. %V 13 %P 865-875 %G English %N 4 %M Zoorec:Zoor14009056778 %0 Journal Article %J Mol. Biol. Evol. %D 2003 %T Gene genealogies when the sample size exceeds the effective size of the population. %A Wakeley, J. %A Takahaski, T. %XWe study the properties of gene genealogies for large samples using a continuous approximation introduced by R. A. Fisher. We show that the major effect of large sample size, relative to the effective size of the population, is to increase the proportion of polymorphisms at which the mutant type is found in a single copy in the sample. We derive analytical expressions for the expected number of these singleton polymorphisms and for the total number of polymorphic, or segregating, sites that are valid even when the sample size is much greater than the effective size of the population. We use simulations to assess the accuracy of these predictions and to investigate other aspects of large-sample genealogies. Lastly, we apply our results to some data from Pacific oysters sampled from British Columbia. This illustrates that, when large samples are available, it is possible to estimate the mutation rate and the effective population size separately, in contrast to the case of small samples in which only the product of the mutation rate and the effective population size can be estimated.

%B Mol. Biol. Evol.
%V 20
%P 208-213
%G eng
%N 2
%0 Journal Article
%J Genetics
%D 2003
%T Theory of the effects of population structure and sampling on patterns of linkage disequilibrium applied to genomic data from humans.
%A Wakeley, J.
%A Lessard, S.
%X We We develop predictions for the correlation of heterozygosity and for linkage disequilibrium between loci using a simple model of population structure that includes migration among local populations, or demes. We compare the results for a sample of size two from the same deme (single-deme sample) to those for a sample of size two from two different demes (a scattered sample). The correlation in heterozygosity for a scattered sample is surprisingly insensitive to both the migration rate and the number of demes. In contrast, the correlation in heterozygosity for a single-deme sample is sensitive to both, and the effect of an increase in the number of demes is qualitatively similar to that of a decrease in the migration rate: both increase the correlation in heterozygosity. These same conclusions hold for a commonly used measure of the linkage disequilibrium (r^{2}). We compare the predictions of the theory to genomic data from humans and show that subdivision might account for a substantial portion of the genetic associations observed within the human genome, even though migration rates among local populations of humans are relatively large. Because correlations due to subdivision rather than to physical linkage can be large even in a single-deme sample, then if long-term migration has been important in shaping patterns of human polymorphism, the common practice of disease mapping using linkage disequilibrium in “isolated” local populations may be subject to error.

%B Genetics
%V 164
%P 1043-1053
%G eng
%0 Journal Article
%J Genetics
%D 2003
%T A diffusion approximation for selection and drift in a subdivided population
%A Cherry, Joshua L.
%A Wakeley, John
%K Evolution
%K Genetic techniques
%K Genetics
%K Population genetics
%K Techniques
%X The population-genetic consequences of population structure are of great interest and have been studied extensively. An area of particular interest is the interaction among population structure, natural selection, and genetic drift. At first glance, different results in this area give very different impressions of the effect of population subdivision on effective population size (Ne), suggesting that no single value of Ne can completely characterize a structured population. Results presented here show that a population conforming to Wright's island model of subdivision with genic selection can be related to an idealized panmictic population (a Wright-Fisher population). This equivalent panmictic population has a larger size than the actual population; i.e., Ne is larger than the actual population size, as expected from many results for this type of population structure. The selection coefficient in the equivalent panmictic population, referred to here as the effective selection coefficient (se), is smaller than the actual selection coefficient (s). This explains how the fixation probability of a selected allele can be unaffected by population subdivision despite the fact that subdivision increases Ne, for the product Nese, is not altered by subdivision.

%B Genetics %V 163 %P 421-428 %G English %N 1 %M Zoorec:Zoor13900024601 %0 Journal Article %J Genetics %D 2003 %T Polymorphism and divergence for island-model species. %A Wakeley, John %K Evolution %K Evolutionary adaptation %K Genetic techniques %K Genetics %K Techniques %K Variation %XEstimates of the scaled selection coefficient, [gamma] of Sawyer and Hartl, are shown to be remarkably robust to population subdivision. Estimates of mutation parameters and divergence times, in contrast, are very sensitive to subdivision. These results follow from an analysis of natural selection and genetic drift in the island model of subdivision in the limit of a very large number of subpopulations, or demes. In particular, a diffusion process is shown to hold for the average allele frequency among demes in which the level of subdivision sets the timescale of drift and selection and determines the dynamic equilibrium of allele frequencies among demes. This provides a framework for inference about mutation, selection, divergence, and migration when data are available from a number of unlinked nucleotide sites. The effects of subdivision on parameter estimates depend on the distribution of samples among demes. If samples are taken singly from different demes, the only effect of subdivision is in the rescaling of mutation and divergence-time parameters. If multiple samples are taken from one or more demes, high levels of within-deme relatedness lead to low levels of intraspecies polymorphism and increase the number of fixed differences between samples from two species. If subdivision is ignored, mutation parameters are underestimated and the species divergence time is overestimated, sometimes quite drastically. Estimates of the strength of selection are much less strongly affected and always in a conservative direction.

%B Genetics %V 163 %P 411-420 %G English %N 1 %M Zoorec:Zoor13900024600 %0 Journal Article %J Proc. Natl. Acad. Sci., USA %D 2003 %T The solitary wave of asexual evolution %A Rouzine, Igor M. %A Wakeley, John %A Coffin, John M. %K Genetics %K Reproduction %XUsing a previously undescribed approach, we develop an analytic model that predicts whether an asexual population accumulates advantageous or deleterious mutations over time and the rate at which either process occurs. The model considers a large number of linked identical loci, or nucleotide sites; assumes that the selection coefficient per site is much less than the mutation rate per genome; and includes back and compensating mutations. Using analysis and Monte Carlo simulations, we demonstrate the accuracy of our results over almost the entire range of population sizes. Two limiting cases of our results, when either deleterious or advantageous mutations can be neglected, correspond to the Fisher-Muller effect and Muller's ratchet, respectively. By comparing predictions of our model (no recombination) to those of simple single-locus models (strong recombination), we show that the accumulation of advantageous mutations is slowed by linkage over a broad, finite range of population size. This supports the view of Fisher and Muller, who argued in the 1930s that progressive evolution of organisms is slowed because loci at which beneficial mutations can occur are often linked together on the same chromosome. These results follow from our main finding, that distribution of sequences over the mutation number evolves as a traveling wave whose speed and width depend on population size and other parameters. The model explains a logarithmic dependence of steady-state fitness on the population size reported recently for an RNA virus.

%B Proc. Natl. Acad. Sci., USA %V 100 %P 587-592 %G English %N 2 %M Zoorec:Zoor13900029859 %0 Journal Article %J Genetics %D 2002 %T The coalescent in a continuous, finite, linear population %A Wilkins, Jon F. %A Wakeley, John %K Genetic techniques %K Genetics %K Techniques %XIn this article we present a model for analyzing patterns of genetic diversity in a continuous, finite, linear habitat with restricted gene flow. The distribution of coalescent times and locations is derived for a pair of sequences sampled from arbitrary locations along the habitat. The results for mean time to coalescence are compared to simulated data. As expected, mean time to common ancestry increases with the distance separating the two sequences. Additionally, this mean time is greater near the center of the habitat than near the ends. In the distant past, lineages that have not undergone coalescence are more likely to have been at opposite ends of the population range, whereas coalescent events in the distant past are biased toward the center. All of these effects are more pronounced when gene flow is more limited. The pattern of pairwise nucleotide differences predicted by the model is compared to data collected from sardine populations. The sardine data are used to illustrate how demographic parameters can be estimated using the model.

%B Genetics %V 161 %P 873-888 %G English %M Zoorec:Zoor13800049251 %0 Journal Article %J Annu. Rev. Ecol. Syst. %D 2002 %T Estimating divergence times from molecular data on phylogenetic and population genetic timescales %A Arbogast, Brian S. %A Edwards, Scott V. %A Wakeley, John %A Beerli, Peter %A Slowinski, Joseph B. %K Documentation %K Evolution %K Evolutionary adaptation %K Genetics %K Publications %K Systematics %XMolecular clocks have profoundly influenced modem views on the timing of important events in evolutionary history. We review recent advances in estimating divergence times from molecular data, emphasizing the continuum between processes at the phylogenetic and population genetic scales. On the phylogenetic scale, we address the complexities of DNA sequence evolution as they relate to estimating divergences, focusing on models of nucleotide substitution and problems associated with among-site and among-lineage rate variation. On the population genetic scale, we review advances in the incorporation of ancestral population processes into the estimation of divergence times between recently separated species. Throughout the review we emphasize new statistical methods and the importance of model testing during the process of divergence time estimation.

%B Annu. Rev. Ecol. Syst. %V 33 %P 707-740 %G English %M Zoorec:Zoor13900015077 %0 Journal Article %J Genetics %D 2001 %T Directional selection and the site-frequency spectrum. %A Bustamante, C. D. %A Wakeley, J. %A Sawyer, S. %A Hartl, D. L. %X In this article we explore statistical properties of the maximum-likelihood estimates (MLEs) of the selection and mutation parameters in a Poisson random field population genetics model of directional selection at DNA sites. We derive the asymptotic variances and covariance of the MLEs and explore the power of the likelihood ratio tests (LRT) of neutrality for varying levels of mutation and selection as well as the robustness of the LRT to deviations from the assumption of free recombination among sites. We also discuss the coverage of confidence intervals on the basis of two standard-likelihood methods. We find that the LRT has high power to detect deviations from neutrality and that the maximum-likelihood estimation performs very well when the ancestral states of all mutations in the sample are known. When the ancestral states are not known, the test has high power to detect deviations from neutrality for negative selection but not for positive selection. We also find that the LRT is not robust to deviations from the assumption of independence among sites. %B Genetics %V 159 %P 1779-1788 %G eng %0 Journal Article %J Am. J. Hum. Genet. %D 2001 %T The discovery of single-nucleotide polymorphisms--and inferences about human demographic history. %A Wakeley, J. %A Nielsen, R. %A Liu-Cordero, S.N. %A Ardlie, K. %X A method of historical inference that accounts for ascertainment bias is developed and applied to single-nucleotide polymorphism (SNP) data in humans. The data consist of 84 short fragments of the genome that were selected, from three recent SNP surveys, to contain at least two polymorphisms in their respective ascertainment samples and that were then fully resequenced in 47 globally distributed individuals. Ascertainment bias is the deviation, from what would be observed in a random sample, caused either by discovery of polymorphisms in small samples or by locus selection based on levels or patterns of polymorphism. The three SNP surveys from which the present data were derived differ both in their protocols for ascertainment and in the size of the samples used for discovery. We implemented a Monte Carlo maximum-likelihood method to fit a subdivided-population model that includes a possible change in effective size at some time in the past. Incorrectly assuming that ascertainment bias does not exist causes errors in inference, affecting both estimates of migration rates and historical changes in size. Migration rates are overestimated when ascertainment bias is ignored. However, the direction of error in inferences about changes in effective population size (whether the population is inferred to be shrinking or growing) depends on whether either the numbers of SNPs per fragment or the SNP-allele frequencies are analyzed. We use the abbreviation “SDL,” for “SNP-discovered locus,” in recognition of the genomic-discovery context of SNPs. When ascertainment bias is modeled fully, both the number of SNPs per SDL and their allele frequencies support a scenario of growth in effective size in the context of a subdivided population. If subdivision is ignored, however, the hypothesis of constant effective population size cannot be rejected. An important conclusion of this work is that, in demographic or other studies, SNP data are useful only to the extent that their ascertainment can be modeled. %B Am. J. Hum. Genet. %V 69 %P 1332-1347 %G eng %0 Journal Article %J Genetics %D 2001 %T Distinguishing migration from isolation: a Markov chain Monte Carlo approach. %A Nielsen, R. %A Wakeley, J. %XA Markov chain Monte Carlo method for estimating the relative effects of migration and isolation on genetic diversity in a pair of populations from DNA sequence data is developed and tested using simulations. The two populations are assumed to be descended from a panmictic ancestral population at some time in the past and may (or may not) after that be connected by migration. The use of a Markov chain Monte Carlo method allows the joint estimation of multiple demographic parameters in either a Bayesian or a likelihood framework. The parameters estimated include the migration rate for each population, the time since the two populations diverged from a common ancestral population, and the relative size of each of the two current populations and of the common ancestral population. The results show that even a single nonrecombining genetic locus can provide substantial power to test the hypothesis of no ongoing migration and/or to test models of symmetric migration between the two populations. The use of the method is illustrated in an application to mitochondrial DNA sequence data from a fish species: the threespine stickleback (*Gasterosteus aculeatus*).

A simple genealogical structure is found for a general finite island model of population subdivision. The model allows for variation in the sizes of demes, in contributions to the migrant pool, and in the fraction of each deme that is replaced by migrants every generation. The ancestry of a sample of non-recombining DNA sequences has a simple structure when the sample size is much smaller than the total number of demes in the population. This allows an expression for the probability distribution of the number of segregating sites in the sample to be derived under the infinite-sites mutation model. It also yields easily computed estimators of the migration parameter for each deme in a multi-deme sample. The genealogical process is such that the lineages ancestral to the sample tend to accumulate in demes with low migration rates and/or which contribute disproportionately to the migrant pool. In addition, common ancestor or coalescent events tend to occur in demes of small size. This provides a framework for understanding the determinants of the effective size of the population, and leads to an expression for the probability that the root of a genealogy occurs in a particular geographic region, or among a particular set of demes.

%B Theoret. Pop. Biol. %V 59 %P 133-144 %G English %N 2 %M Zoorec:Zoor13700038071 %0 Journal Article %J Evolution %D 2000 %T The effects of subdivision on the genetic divergence of populations and species %A Wakeley, John %K Ecology %K Evolution %K Evolutionary adaptation %K Genetics %K Population dynamics %K Recruitment %X An island model of migration is used to study the effects of subdivision within populations and species on sample genealogies and on between-population or between-species measures of genetic variation. The model assumes that the number of demes within each population or species is large. When populations (or species), connected either by gene flow or historical association, are themselves subdivided into demes, changes in the migration rate among demes alter both the structure of genealogies and the time scale of the coalescent process. The time scale of the coalescent is related to the effective size of the population, which depends on the migration rate among demes. When the migration rate among demes within populations is low, isolation (or speciation) events seem more recent and migration rates among populations seem higher because the effective size of each population is increased. This affects the probability of reciprocal monophyly of two samples, the chance that a gene tree of a sample matches the species tree, and relative likelihoods of different types of polymorphic sites. It can also have a profound effect on the estimation of divergence times. %B Evolution %V 54 %P 1092-1101 %G English %N 4 %M Zoorec:Zoor13700000570 %0 Journal Article %J Genetics %D 2000 %T The population genetics of the origin and divergence of the Drosophila simulans complex species. %A Kliman, Richard M. %A Andolfatto, Peter %A Coyne, Jerry A. %A Depaulis, Frantz %A Kreitman, Martin %A Berry, Andrew J. %A McCarter, James %A Wakeley, John %A Hey, Jody %K Arthropods %K Biochemistry %K Evolution %K Genetics %K Insects %K Invertebrates %K True Flies %K Variation %XThe origins and divergence of Drosophila simulans and close relatives D. mauritiana and D. sechellia were examined using the patterns of DNA sequence variation found within and between species at 14 different genes. D. sechellia consistently revealed low levels of polymorphism, and genes from D. sechellia have accumulated mutations at a rate that is approximately 50% higher than the same genes from D. simulans. At synonymous sites, D. sechellia has experienced a significant excess of unpreferred codon substitutions. Together these observations suggest that D. sechellia has had a reduced effective population size for some time, and that it is accumulating slightly deleterious mutations as a result. D. simulans and D. mauritiana are both highly polymorphic and the two species share many polymorphisms, probably since the time of common ancestry. A simple isolation speciation model, with zero gene flow following incipient species separation, was fitted to both the simulans/mauritiana divergence and the simulans/sechellia divergence. In both cases the model fit the data quite well, and the analyses revealed little evidence of gene flow between the species. The exception is one gene copy at one locus in D. sechellia, which closely resembled other D. simulans sequences. The overall picture is of two allopatric speciation events that occurred quite near one another in time.

%B Genetics %V 156 %P 1913-1931 %G English %N 4 %M Zoorec:Zoor13700042817 %0 Journal Article %J Genetics %D 1999 %T Non-equilibrium migration in human history. %A Wakeley, J. %XA nonequilibrium migration model is proposed and applied to genetic data from humans. The model assumes symmetric migration among all possible pairs of demes and that the number of demes is large. With these assumptions it is straightforward to allow for changes in demography, and here a single abrupt change is considered. Under the model this change is identical to a change in the ancestral effective population size and might be caused by changes in deme size, in the number of demes, or in the migration rate. Expressions for the expected numbers of sites segregating at particular frequencies in a multideme sample are derived. A maximum-likelihood analysis of independent polymorphic restriction sites in humans reveals a decrease in effective size. This is consistent with a change in the rates of migration among human subpopulations from ancient low levels to present high ones.

%B Genetics %V 153 %P 1836-1871 %G eng %N 4 %0 Journal Article %J Biological Bulletin (Woods Hole) %D 1999 %T Genes and other samples of DNA sequence data for phylogenetic inference %A Cummings, Michael P. %A Otto, Sarah P. %A Wakeley, John %K Biochemistry %K Chordates %K Cytology %K Evolution %K Genetics %K Organelles %K Protoplasm %K Variation %K Vertebrates %B Biological Bulletin (Woods Hole) %V 196 %P 345-350 %G English %M Zoorec:Zoor13600020777 %0 Journal Article %J Theoret. Pop. Biol. %D 1998 %T Segregating sites in Wright's island model. %A Wakeley, J. %XExpressions for the expectation and variance of the number of segregating sites in samples from an island model of population subdivision are derived. For small samples, an arbitrary number of demes can be accommodated. Results for larger samples are derived under the assumption of an infinite number of demes. However, simulations indicate that the latter results will hold quite well for the finite-island model in many cases. A new estimator of the population migration rate is proposed and is shown to outperform the widely used pairwise method.

%B Theoret. Pop. Biol. %V 53 %P 166-174 %G eng %N 2 %0 Journal Article %J Genetics %D 1997 %T A coalescent estimator of the population recombination rate. %A Hey, J. %A Wakeley, J. %XPopulation genetic models often use a population recombination parameter 4Nc, where N is the effective population size and c is the recombination rate per generation. In many ways 4Nc is comparable to 4Nu, the population mutation rate. Both combine genome level and population level processes, and together they describe the rate of production of genetic variation in a population. However, 4Nc is more difficult to estimate. For a population sample of DNA sequences, historical recombination can only be detected if polymorphisms exist, and even then most recombination events are not detectable. This paper describes an estimator of 4Nc, hereafter designated gamma (gamma), that was developed using a coalescent model for a sample of four DNA sequences with recombination. The reliability of gamma was assessed using multiple coalescent simulations. In general gamma has low to moderate bias, and the reliability of gamma is comparable, though less, than that for a widely used estimator of 4Nu. If there exists an independent estimate of the recombination rate (per generation, per base pair), gamma can be used to estimate the effective population size or the neutral mutation rate.

The expected numbers of different categories of polymorphic sites are derived for two related models of population history the isolation model, in which an ancestral population splits into two descendents, and the size-change model, in which a single population undergoes an instantaneous change in size. For the isolation model, the observed numbers of shared, fixed, and exclusive polymorphic sites are used to estimate the relative sizes of the three populations, ancestral plus two descendent, as well as the time of the split. For the size change model, the numbers of sites segregating at particular frequencies in the sample are used to estimate the relative sizes of the ancestral and descendent populations plus the time the change took place. Parameters are estimated by choosing values that most closely equate expectations with observations. Computer simulations show that current and historical population parameters can be estimated accurately. The methods are applied to DNA data from two species of Drosophila and to some human mitochondrial DNA sequences.

%B Genetics %V 145 %P 847-855 %G eng %N 3 %0 Journal Article %J Genet. Res., Camb. %D 1997 %T Using the variance of pairwise differences to estimate the recombination rate. %A Wakeley, J. %XA new estimator is proposed for the parameter C = 4*Nc*, where N is the population size and *c *is the recombination rate in a finite population model without selection. The estimator is an improved version of Hudson's (1987) estimator, which takes advantage of some recent theoretical developments. The improvement is slight, but the smaller bias and standard error of the new estimator support its use. The variance of the average number of pairwise differences is also derived, and is important in the formulation of the new estimator.

The divergence of Drosophila pseudoobscura and close relatives D. persimilis and D. pseudoobscura bogotana has been studied using comparative DNA sequence data from multiple nuclear loci. New data from the Hsp82 and Adh regions, in conjunction with existing data from Adh and the Period locus, are examined in the light of various models of speciation. The principal finding is that the three loci present very different histories, with Adh indicating large amounts of recent gene flow among the taxa, while little or no gene flow is apparent in the data from the other loci. The data were compared with predictions from several isolation models of divergence. These models include no gene flow, and they were found to be incompatible with the data. Instead the DNA data, taken together with other evidence, seem consistent with divergence models in which natural selection acts against gene flow at some loci more than at others. This family of models includes some sympatric and parapatric speciation models, as well as models of secondary contact and subsequent reinforcement of sexual isolation.

%B Genetics %V 147 %P 1091-1106 %G English %M Zoorec:Zoor13400040467 %0 Journal Article %J Trends in Ecology and Evolution %D 1996 %T The excess of transitions among nucleotide substitutions: new methods of estimating transition bias underscore its significance. %A Wakeley, J. %XEstimates of transition bias provide insight into the process of nucleotide substitution, and are required in some commonly used phylogenetic methods. Transitions are favored over transversions among spontaneous mutations, and the direction and strength of selection on proteins and RNA appears to depend on mutation type. As the complexity of the nucleotide-substitution process has become apparent, problems with classical methods of estimating transition bias have been recognized. These problems arise because there Is a fundamental difference between ratios of numbers of differences among sequences and ratios of rates, and because classical methods are not easily generalized. Several new methods are now available.

%B Trends in Ecology and Evolution %V 11 %P 158-163 %G eng %0 Journal Article %J J. Genet. %D 1996 %T Pairwise differences under a general model of population subdivision. %A Wakeley, J. %X A number of different migration and isolation models of population subdivision have been studied. In this paper I analyse a general model of two populations derived from a common ancestral population at some time in the past. The two populations may exchange migrants, but they may also be completely isolated from each other. I derive the expectation and variance of the number of differences between two sequences sampled from the two populations. These are then compared to the corresponding results from two other much-used models: equilibrium migration and complete isolation. %B J. Genet. %V 75 %P 81-89 %G eng %0 Journal Article %J Theoret. Pop. Biol. %D 1996 %T Distinguishing migration from isolation using the variance of pairwise differences %A Wakeley, John %K Behaviour %K Biochemistry %K Evolution %K Evolutionary adaptation %K Genetic techniques %K Genetics %K Speciation %K Techniques %K Zoogeography %XTwo demographic scenarios are considered: two populations with migration and two populations that have been completely isolated from each other for some period of time. The variance of the number of differences between pairs of sequences in a single sample is studied and forms the basis of a test of the isolation model. The migration model is one possible alternative to isolation. The isolation model is rejected when the proposed test statistic, which involves the variances of pairwise difference within and between populations, is larger than power and realized significance of the test are investigated using simulations, and an example using mitochondrial DNA illustrates its application.

%B Theoret. Pop. Biol. %V 49 %P 369-386 %G English %N 3 %M Zoorec:Zoor13300015712 %0 Journal Article %J Theoretical Population Biology %D 1996 %T The variance of pairwise nucleotide differences in two populations with migration %A Wakeley, John %K Behaviour %K Biochemistry %K Birds %K Chordates %K Ecology %K Evolution %K Evolutionary adaptation %K Genetic techniques %K Genetics %K Speciation %K Techniques %K Vertebrates %XThe variances of three measures of pairwise difference are derived for the case of two populations that exchange migrants. The resulting expressions can be used to place standard errors an estimates of population genetic parameters. The three measures considered are the average number of intrapopulation nucleotide differences, the average number of interpopulation nucleotide differences, and the net number of nucleotide differences between the two populations. The expectations of these statistics are previously known and suggest that they might be used to the quantify the divergence between populations. However, the standard errors of all three statistics are shown to be quite large relative to their expectations. Thus, our ability to quantify divergence between populations with them is limited, at least using available data. An analysis of mitochondrial DNA sequences from grey-crowned babblers illustrates the application of the theory. The variances derived here for migration are compared to previously published results for two populations that have been completely isolated from one another for some length of time. All three variances are greater under migration than under isolation, suggesting that a test to distinguish these two demographic situations could be developed.

%B Theoretical Population Biology %V 49 %P 39-57 %G English %M Zoorec:Zoor13300015700 %0 Journal Article %J Molecular Biology and Evolution %D 1995 %T Sampling properties of DNA sequence data in phylogenetic analysis %A Cummings, Michael P. %A Otto, Sarah P. %A Wakeley, John %K Biochemistry %K Chordates %K Evolution %K Genetics %K Techniques %K Vertebrates %XWe inferred phylogenetic trees from individual genes and random samples of nucleotides from the mitochondrial genomes of 10 vertebrates and compared the results to those obtained by analyzing the whole genomes. Individual genes are poor samples in that they infrequently lead to the whole-genome tree. A large number of nucleotide sites is needed to exactly determine the whole-genome tree. A relatively small number of sites, however, often results in a tree close to the whole-genome tree. We found that blocks of contiguous sites were less likely to lead to the whole-genome tree than samples composed of sites drawn individually from throughout the genome. Samples of contiguous sites are not representative of the entire genome, a condition that violates a basic assumption of the bootstrap method as it is applied in phylogenetic studies.

%B Molecular Biology and Evolution %V 12 %P 814-822 %G English %N 5 %M Zoorec:Zoor13200039236 %0 Journal Article %J Mol. Biol. Evol. %D 1994 %T Substitution-rate variation among sites and the estimation of transition bias. %A J. Wakeley %XSubstitution-rate variation among sites and differences in the probabilities of change among the four nucleotides are conflated in DNA sequence comparisons. When variation in rate exists among sites but is ignored, biases in the rates of change among nucleotides are underestimated. This paper provides a quantification of this effect when the observed proportions of transitions, P, and transversions, Q, between two sequences are used to estimate transition bias. The utility of P/Q as an estimator is examined both with and without rate variation among sites. A gamma-distributed-rates model is used to illustrate the effect that variation among sites has on estimates of transition bias, but it is argued that the basic results should hold for any pattern of rate variation. Naive estimates of the extent of transition bias, those that ignore rate variation when it is present, can seriously underestimate its true value. The extent of this underestimation increases with the amount of rate variation among sites. An example using human mitochondrial DNA shows that a simple comparison of the proportions of transitions and transversions in recently diverged sequences underestimates the level of transition bias by approximately 15%. This does not depend on the use of P/Q to estimate transition bias; maximum-likelihood methods give similar results.

%B Mol. Biol. Evol. %V 11 %P 436-442 %G eng %N 3 %0 Journal Article %J J. Mol. Evol. %D 1993 %T Substitution rate variation among sites in hypervariable region I of human mitochondrial DNA. %A J. Wakeley %XMore than an order of magnitude difference in substitution rate exists among sites within hypervariable region 1 of the control region of human mitochondrial DNA. A two-rate Poisson mixture and a negative binomial distribution are used to describe the distribution of the inferred number of changes per nucleotide site in this region. When three data sets are pooled, however, the two-rate model cannot explain the data. The negative binomial distribution always fits, suggesting that substitution rates are approximately gamma distributed among sites. Simulations presented here provide support for the use of a biased, yet commonly employed, method of examining rate variation. The use of parsimony in the method to infer the number of changes at each site introduces systematic errors into the analysis. These errors preclude an unbiased quantification of variation in substitution rate but make the method conservative overall. The method can be used to distinguish sites with highly elevated rates, and 29 such sites are identified in hypervariable region 1. Variation does not appear to be clustered within this region. Simulations show that biases in rates of substitution among nucleotides and non-uniform base composition can mimic the effects of variation in rate among sites. However, these factors contribute little to the levels of rate variation observed in hypervariable region 1.

%B J. Mol. Evol. %V 37 %P 613-623 %G eng