We are devoted to the study of theoretical population genetics. The goal of population genetics is to identify and understand the forces that produce and maintain genetic variation in natural populations. These forces include mutation (also recombination and gene conversion), natural selection, various kinds of population structure (e.g. subdivision with migration), and the random fluctuations of gene frequencies through time known as genetic drift. We study these forces mathematically, using both analysis and computation. We also develop statistical methods to make inferences about these forces from DNA sequences or other kinds of genetic data. For more information about specific areas of research, follow the leads to lab members.


Palacios, JA, J Wakeley, and S Ramachandran. 2015. “Bayesian nonparametric inference of population size changes from sequential genealogies.” Genetics 201 (1): 281-304.Abstract

Sophisticated inferential tools coupled with the coalescent model have recently emerged for estimating past population sizes from genomic data. Recent methods that model recombination require small sample sizes, make constraining assumptions about population size changes, and do not report measures of uncertainty for estimates. Here, we develop a Gaussian process-based Bayesian nonparametric method coupled with a sequentially Markov coalescent model that allows accurate inference of population sizes over time from a set of genealogies. In contrast to current methods, our approach considers a broad class of recombination events, including those that do not change local genealogies. We show that our method outperforms recent likelihood-based methods that rely on discretization of the parameter space. We illustrate the application of our method to multiple demographic histories, including population bottlenecks and exponential growth. In simulation, our Bayesian approach produces point estimates four times more accurate than maximum-likelihood estimation (based on the sum of absolute differences between the truth and the estimated values). Further, our method's credible intervals for population size as a function of time cover 90% of true values across multiple demographic scenarios, enabling formal hypothesis testing about population size differences over time. Using genealogies estimated with ARGweaver, we apply our method to European and Yoruban samples from the 1000 Genomes Project and confirm key known aspects of population size history over the past 150,000 years.

Carmi, S, PR Wilton, J Wakeley, and I Pe'er. 2014. “A renewal theory approach to IBD sharing.” Theoret. Pop. Biol. 97: 35-48.Abstract

A long genomic segment inherited by a pair of individuals from a single, recent common ancestor is said to be identical-by-descent (IBD). Shared IBD segments have numerous applications in genetics, from demographic inference to phasing, imputation, pedigree reconstruction, and disease mapping. Here, we provide a theoretical analysis of IBD sharing under Markovian approximations of the coalescent with recombination. We describe a general framework for the IBD process along the chromosome under the Markovian models (SMC/SMC’), as well as introduce and justify a new model, which we term the renewal approximation, under which lengths of successive segments are independent. Then, considering the infinite-chromosome limit of the IBD process, we recover previous results (for SMC) and derive new results (for SMC’) for the mean number of shared segments longer than a cutoff and the fraction of the chromosome found in such segments. We then use renewal theory to derive an expression (in Laplace space) for the distribution of the number of shared segments and demonstrate implications for demographic inference. We also compute (again, in Laplace space) the distribution of the fraction of the chromosome in shared segments, from which we obtain explicit expressions for the first two moments. Finally, we generalize all results to populations with a variable effective size.

Pennings, PS, S Kryazhimskiy, and J Wakeley. 2014. “Loss and recovery of genetic diversity in adapting populations of HIV.” PLoS Genet 10(1): e1004000.Abstract

The evolution of drug resistance in HIV occurs by the fixation of specific, well-known, drug-resistance mutations, but the underlying population genetic processes are not well understood. By analyzing within-patient longitudinal sequence data, we make four observations that shed a light on the underlying processes and allow us to infer the short-term effective population size of the viral population in a patient. Our first observation is that the evolution of drug resistance usually occurs by the fixation of one drug-resistance mutation at a time, as opposed to several changes simultaneously. Second, we find that these fixation events are accompanied by a reduction in genetic diversity in the region surrounding the fixed drug resistance mutation, due to the hitchhiking effect. Third, we observe that the fixation of drug-resistance mutations involves both hard and soft selective sweeps. In a hard sweep, a resistance mutation arises in a single viral particle and drives all linked mutations with it when it spreads in the viral population, which dramatically reduces genetic diversity. On the other hand, in a soft sweep, a resistance mutation occurs multiple times on different genetic backgrounds, and the reduction of diversity is weak. Using the frequency of occurrence of hard and soft sweeps we estimate the effective population size of HIV to be 1:5|105 (95% confidence interval ½0:8|105,4:8|105). This number is much lower than the actual number of infected cells, but much larger than previous population size estimates based on synonymous diversity. We propose several explanations for the observed discrepancies. Finally, our fourth observation is that genetic diversity at non-synonymous sites recovers to its pre-fixation value within 18 months, whereas diversity at synonymous sites remains depressed after this time period. These results improve our understanding of HIV evolution and have potential implications for treatment

Citation: Pennings PS, Kryazhimskiy S, Wakeley J (2014) Loss and Recovery of Genetic Diversity in Adapting Populations of HIV. PLoS Genet 10(1): e1004000.

Editor: Christophe Fraser, Imperial College London, United Kingdom
Received April 19, 2013; Accepted October 19, 2013; Published January 23, 2014

Copyright: 2014 Pennings et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits
unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: SK was supported by a Career Award at Scientific Interface from the Burroughs Wellcome Fund (http://www.bwfund.org/). PSP was supported by a
long-term postdoctoral fellowship of the Human Frontier Science Program (http://www.hfsp.org/). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing Interests: The authors have declared that no competing interests exist.

* E-mail: pleuni@stanford.edu

Wakeley, J, L King, BS Low, and S Ramachandran. 2012. “Gene genealogies within a fixed pedigree, and the robustness of Kingman's coalescent..” Genetics 190 (4): 1433-1445.Abstract

We address a conceptual flaw in the backward-time approach to population genetics called coalescent theory as it is applied to diploid biparental organisms. Specifically, the way random models of reproduction are used in coalescent theory is not justified. Instead, the population pedigree for diploid organisms--that is, the set of all family relationships among members of the population--although unknown, should be treated as a fixed parameter, not as a random quantity. Gene genealogical models should describe the outcome of the percolation of genetic lineages through the population pedigree according to Mendelian inheritance. Using simulated pedigrees, some of which are based on family data from 19th century Sweden, we show that in many cases the (conceptually wrong) standard coalescent model is difficult to reject statistically and in this sense may provide a surprisingly accurate description of gene genealogies on a fixed pedigree. We study the differences between the fixed-pedigree coalescent and the standard coalescent by analysis and simulations. Differences are apparent in recent past, within ≈ <log(2)(N) generations, but then disappear as genetic lineages are traced into the more distant past.