# Publications

In this article we apply some graph-theoretic results to the study of coalescence in a structured population with migration. The graph is the pattern of migration among subpopulations, or demes, and we use the theory of random walks on graphs to characterize the ease with which ancestral lineages can traverse the habitat in a series of migration events. We identify conditions under which the coalescent process in populations with restricted migration, such that individuals cannot traverse the habitat freely in a single migration event, nonetheless becomes identical to the coalescent process in the island migration model in the limit as the number of demes tends to infinity. Specifically, we first note that a sequence of symmetric graphs with Diaconis-Stroock constant bounded above has an unstructured Kingman-type coalescent in the limit for a sample of size two from two different demes. We then show that circular and toroidal models with long-range but restricted migration have an upper bound on this constant and so have an unstructured-migration coalescent in the limit. We investigate the rate of convergence to this limit using simulations.

We study the ancestral genetic process for samples from two large, subdivided populations that are connected by migration to, from, and within a small set of subpopulations, or demes. We consider convergence to an ancestral limit process as the numbers of demes in the two large, subdivided populations tend to infinity. We show that the ancestral limit process for a sample includes a recent instantaneous adjustment to the sample size and structure followed by a more ancient process that is identical to the usual structured coalescent, but with different scaled parameters. This justifies the application of a modified structured coalescent to some hierarchically structured populations.

We show that the unstructured ancestral selection graph applies to part of the history of a sample from population structured by restricted migration among subpopulations, or demes. The result holds in the limit as the number of demes tends to infinity with proportionately weak selection, and we have also made assumptions of island-type migration and that demes are equivalent in size. After an instantaneous sample-size adjustment, this structured ancestral selection graph converges to an unstructured ancestral selection graph with a mutation parameter that depends inversely on the migration rate. In contrast, the selection parameter for the population is independent of the migration rate and is identical to the selection parameter in an unstructured population. We show analytically that estimators of the migration rate, based on pairwise sequence differences, derived under the assumption of neutrality should perform equally well in the presence of weak selection. We also modify an algorithm for simulating genealogies conditional on the frequencies of two selected alleles in a sample. This permits efficient simulation of stronger selection than was previously possible. Using this new algorithm, we simulate gene genealogies under the many-demes ancestral selection graph and identify some situations in which migration has a strong effect on the time to the most recent common ancestor of the sample. We find that a similar effect also increases the sensitivity of the genealogy to selection.

Recent developments in population genetics are reviewed and placed in a historical context. Current and future challenges, both in computational methodology and in analytical theory, are to develop models and techniques to extract the most information possible from multilocus DNA datasets. As an example of the theoretical issues, five limiting forms of the island model of population subdivision with migration are presented in a unified framework. These approximations illustrate the interplay between migration and drift in structuring gene genealogies, and some of them make connections between the fairly complicated island-model genealogical process and the much simpler, unstructured neutral coalescent process which underlies most inferential techniques in population genetics.

A simple nonparameteric test for population structure was applied to temporally spaced samples of HIV-1 sequences from the gag-pol region within two chronically infected individuals. The results show that temporal structure can be detected for samples separated by about 22 months or more. The performance of the method, which was originally proposed to detect geographic structure, was tested for temporally spaced samples using neutral coalescent simulations. Simulations showed that the method is robust to variation in samples sizes and mutation rates, to the presence/absence of recombination, and that the power to detect temporal structure is high. By comparing levels of temporal structure in simulations to the levels observed in real data, we estimate the effective intra-individual population size of HIV-1 to be between 10^{3} and 10^{4} viruses, which is in agreement with some previous estimates. Using this estimate and a simple measure of sequence diversity, we estimate an effective neutral mutation rate of about 5 x 10^{-6} per site per generation in the gag-pol region. The definition and interpretation of estimates of such ‘‘effective’’ population parameters are discussed.

A diffusion approximation is obtained for the frequency of a selected allele in a population comprised of many subpopulations or demes. The form of the diffusion is equivalent to that for an unstructured population, except that it occurs on a longer time scale when migration among demes is restricted. This many-demes diffusion limit relies on the collection of demes always being in statistical equilibrium with respect to migration and drift for a given allele frequency in the total population. Selection is assumed to be weak, in inverse proportion to the number of demes, and the results hold for any deme sizes and migration rates greater than zero. The distribution of allele frequencies among denies is also described. [copyright] 2004 Elsevier Inc. All rights reserved.

The genealogical process for a sample from a metapopulation, in which local populations are connected by migration and can undergo extinction and subsequent recolonization, is shown to have a relatively simple structure in the limit as the number of populations in the metapopulation approaches infinity. The result, which is an approximation to the ancestral behaviour of samples from a metapopulation with a large number of populations, is the same as that previously described for other metapopulation models, namely that the genealogical process is closely related to Kingman's unstructured coalescent. The present work considers a more general class of models that includes two kinds of extinction and recolonization, and the possibility that gamete production precedes extinction. In addition, following other recent work, this result for a metapopulation divided into many populations is shown to hold both for finite population sizes and in the usual diffusion limit, which assumes that population sizes are large. Examples illustrate when the usual diffusion limit is appropriate and when it is not. Some shortcomings and extensions of the model are considered, and the relevance of such models to understanding human history is discussed.

^{2}). We compare the predictions of the theory to genomic data from humans and show that subdivision might account for a substantial portion of the genetic associations observed within the human genome, even though migration rates among local populations of humans are relatively large. Because correlations due to subdivision rather than to physical linkage can be large even in a single-deme sample, then if long-term migration has been important in shaping patterns of human polymorphism, the common practice of disease mapping using linkage disequilibrium in “isolated” local populations may be subject to error.

The population-genetic consequences of population structure are of great interest and have been studied extensively. An area of particular interest is the interaction among population structure, natural selection, and genetic drift. At first glance, different results in this area give very different impressions of the effect of population subdivision on effective population size (Ne), suggesting that no single value of Ne can completely characterize a structured population. Results presented here show that a population conforming to Wright's island model of subdivision with genic selection can be related to an idealized panmictic population (a Wright-Fisher population). This equivalent panmictic population has a larger size than the actual population; i.e., Ne is larger than the actual population size, as expected from many results for this type of population structure. The selection coefficient in the equivalent panmictic population, referred to here as the effective selection coefficient (se), is smaller than the actual selection coefficient (s). This explains how the fixation probability of a selected allele can be unaffected by population subdivision despite the fact that subdivision increases Ne, for the product Nese, is not altered by subdivision.

Estimates of the scaled selection coefficient, [gamma] of Sawyer and Hartl, are shown to be remarkably robust to population subdivision. Estimates of mutation parameters and divergence times, in contrast, are very sensitive to subdivision. These results follow from an analysis of natural selection and genetic drift in the island model of subdivision in the limit of a very large number of subpopulations, or demes. In particular, a diffusion process is shown to hold for the average allele frequency among demes in which the level of subdivision sets the timescale of drift and selection and determines the dynamic equilibrium of allele frequencies among demes. This provides a framework for inference about mutation, selection, divergence, and migration when data are available from a number of unlinked nucleotide sites. The effects of subdivision on parameter estimates depend on the distribution of samples among demes. If samples are taken singly from different demes, the only effect of subdivision is in the rescaling of mutation and divergence-time parameters. If multiple samples are taken from one or more demes, high levels of within-deme relatedness lead to low levels of intraspecies polymorphism and increase the number of fixed differences between samples from two species. If subdivision is ignored, mutation parameters are underestimated and the species divergence time is overestimated, sometimes quite drastically. Estimates of the strength of selection are much less strongly affected and always in a conservative direction.

Using a previously undescribed approach, we develop an analytic model that predicts whether an asexual population accumulates advantageous or deleterious mutations over time and the rate at which either process occurs. The model considers a large number of linked identical loci, or nucleotide sites; assumes that the selection coefficient per site is much less than the mutation rate per genome; and includes back and compensating mutations. Using analysis and Monte Carlo simulations, we demonstrate the accuracy of our results over almost the entire range of population sizes. Two limiting cases of our results, when either deleterious or advantageous mutations can be neglected, correspond to the Fisher-Muller effect and Muller's ratchet, respectively. By comparing predictions of our model (no recombination) to those of simple single-locus models (strong recombination), we show that the accumulation of advantageous mutations is slowed by linkage over a broad, finite range of population size. This supports the view of Fisher and Muller, who argued in the 1930s that progressive evolution of organisms is slowed because loci at which beneficial mutations can occur are often linked together on the same chromosome. These results follow from our main finding, that distribution of sequences over the mutation number evolves as a traveling wave whose speed and width depend on population size and other parameters. The model explains a logarithmic dependence of steady-state fitness on the population size reported recently for an RNA virus.

In this article we present a model for analyzing patterns of genetic diversity in a continuous, finite, linear habitat with restricted gene flow. The distribution of coalescent times and locations is derived for a pair of sequences sampled from arbitrary locations along the habitat. The results for mean time to coalescence are compared to simulated data. As expected, mean time to common ancestry increases with the distance separating the two sequences. Additionally, this mean time is greater near the center of the habitat than near the ends. In the distant past, lineages that have not undergone coalescence are more likely to have been at opposite ends of the population range, whereas coalescent events in the distant past are biased toward the center. All of these effects are more pronounced when gene flow is more limited. The pattern of pairwise nucleotide differences predicted by the model is compared to data collected from sardine populations. The sardine data are used to illustrate how demographic parameters can be estimated using the model.

Molecular clocks have profoundly influenced modem views on the timing of important events in evolutionary history. We review recent advances in estimating divergence times from molecular data, emphasizing the continuum between processes at the phylogenetic and population genetic scales. On the phylogenetic scale, we address the complexities of DNA sequence evolution as they relate to estimating divergences, focusing on models of nucleotide substitution and problems associated with among-site and among-lineage rate variation. On the population genetic scale, we review advances in the incorporation of ancestral population processes into the estimation of divergence times between recently separated species. Throughout the review we emphasize new statistical methods and the importance of model testing during the process of divergence time estimation.

A Markov chain Monte Carlo method for estimating the relative effects of migration and isolation on genetic diversity in a pair of populations from DNA sequence data is developed and tested using simulations. The two populations are assumed to be descended from a panmictic ancestral population at some time in the past and may (or may not) after that be connected by migration. The use of a Markov chain Monte Carlo method allows the joint estimation of multiple demographic parameters in either a Bayesian or a likelihood framework. The parameters estimated include the migration rate for each population, the time since the two populations diverged from a common ancestral population, and the relative size of each of the two current populations and of the common ancestral population. The results show that even a single nonrecombining genetic locus can provide substantial power to test the hypothesis of no ongoing migration and/or to test models of symmetric migration between the two populations. The use of the method is illustrated in an application to mitochondrial DNA sequence data from a fish species: the threespine stickleback (*Gasterosteus aculeatus*).