Sophisticated inferential tools coupled with the coalescent model have recently emerged for estimating past population sizes from genomic data. Recent methods that model recombination require small sample sizes, make constraining assumptions about population size changes, and do not report measures of uncertainty for estimates. Here, we develop a Gaussian process-based Bayesian nonparametric method coupled with a sequentially Markov coalescent model that allows accurate inference of population sizes over time from a set of genealogies. In contrast to current methods, our approach considers a broad class of recombination events, including those that do not change local genealogies. We show that our method outperforms recent likelihood-based methods that rely on discretization of the parameter space. We illustrate the application of our method to multiple demographic histories, including population bottlenecks and exponential growth. In simulation, our Bayesian approach produces point estimates four times more accurate than maximum-likelihood estimation (based on the sum of absolute differences between the truth and the estimated values). Further, our method's credible intervals for population size as a function of time cover 90% of true values across multiple demographic scenarios, enabling formal hypothesis testing about population size differences over time. Using genealogies estimated with ARGweaver, we apply our method to European and Yoruban samples from the 1000 Genomes Project and confirm key known aspects of population size history over the past 150,000 years.
We are devoted to the study of theoretical population genetics. The goal of population genetics is to identify and understand the forces that produce and maintain genetic variation in natural populations. These forces include mutation (also recombination and gene conversion), natural selection, various kinds of population structure (e.g. subdivision with migration), and the random fluctuations of gene frequencies through time known as genetic drift. We study these forces mathematically, using both analysis and computation. We also develop statistical methods to make inferences about these forces from DNA sequences or other kinds of genetic data. For more information about specific areas of research, follow the leads to lab members.