Abstracts

Josh Akey, Excavating ancient DNA from the the genomes of modern humans

Anatomically modern humans overlapped and mated with Neandertals such that non-African humans inherit ~1 to 3% of their genomes from Neandertal ancestors. We developed an approach to identify Neanderthal lineages that persist in the DNA of modern humans, and applied it to whole-genome sequences from 379 European and 286 Asian individuals. In total, we recovered over 15 Gb of introgressed sequence that spans ~20% of the Neanderthal genome (FDR = 5%). Analyses of surviving archaic lineages suggests that there were fitness costs to hybridization, admixture occurred both before and subsequent to divergence of non-African modern humans, and Neanderthals were a source of adaptive variation for loci involved in skin phenotypes, including regulatory sequences of the BNC2 and POU2F3 genes. Our results provide a new avenue for paleogenomics studies, allowing substantial amounts of population-level DNA sequence information to be obtained from extinct groups, even in the absence of fossilized remains.

Robin Allaby, Using archaeogenomic and computational approaches to unravel the history of local adaptation in crops

Our understanding of the evolution of domestication has changed radically in the past ten years from a relatively simplistic rapid origin scenario to a protracted complex process. This complexity has required the adoption of new model approaches to better understand the expectations of genetic diversity and integrate diverse factors across several levels of biological organization and the genome wide level. We have approached this through a combination of agent based modeling and archaeogenomics, the former informing our interpretation of the latter. Through the use of archaeogenomics we can observe genome evolution directly through time. We have been applying this to understand how domesticated crops have evolved and adapted to their environments. A limiting factor is that our domesticate crops have largely originated from areas of low latitude where temperatures are high and conditions are dry. We established in NGS proof of principal studies that under hot and dry conditions DNA has a half-life of around 350 years enabling us to probe biomolecular information easily to 3-4 thousand years and make some surprising insights into the rate of genome architecture evolution in cotton. DNA capture approaches have allowed us to survey a rosta of genes in barley in an investigation of adaptation to drought resistance. We investigated the archaeological presence of RNA under these hot and dry environmental conditions and found that its rate of decay is drastically reduced being only 2-4 fold higher than that of DNA. This allowed us to investigate the small RNA profile of barleys, where we find interesting and significant differences to modern barleys that relate to environmental conditions. We further retrieved an entire viral genome, which gives us an insight to its origin and probable association with an intensification of agriculture. Finally, we have investigated the stability of methylation in archaeological barley remains, which is detectable and shows a stress-associated peak around 800 years ago. In conclusion, the evidence supports the notion that the time is right for plant archaeological genomic reconstruction. A good reason to do this is to more powerfully detect signatures of selection that have become too faint over time to detect any other way.

Cecile Ben, Using patterns of genetic variation within modern plant accessions to reveal ancestral genome admixture and to infer population adaptation in the wild legume species Medicago truncatula.

The Mediterranean Basin is a heterogeneous zone where patterns of plant distribution, driven by an entangled interplay between drift, gene flow and local adaptation to biotic and abiotic factors via selection, are complex to understand. The Mediterranean-native wild species Medicago truncatula is a particularly well-suited plant model to address this question as large and high-quality genomic resources based on whole-genome resequencing of more than 260 accessions have recently been developed allowing extensive studies of the population structure in this legume.By using 240.000 randomly selected intergenic SNPs and admixture-based method, we inferred putative ancestral genomes and their patterns of distribution. Our study showed that local adaptation with subsequent admixture may be responsible for Mediterranean-wide distribution of M. truncatula populations, Algeria and Tunisia likely being the proposed center of diversification of the species. Using the recently described Geographic Population Structure algorithm (GPS), adapted to predominantly selfing plant species, we were able to predict the geographical origin of several non-documented M. truncatula accessions, among which the Jemalong-A17 reference genome for the species. Leave-one-out procedure revealed a 75 % accuracy in correct assignment. High population differentiation, and heterogeneous level of admixture among the populations were reported. This raises the question whether local adaptation prevents population admixture in some particular native ranges. Large dispersal of some ancestral genomes, putatively associated with anthropic disturbances or recent and strong demographic expansion, were also evidenced. Lastly, population structure was shown to be correlated to major bioclimatic or geographic variables. Some relationships with biological traits such as disease resistance or some symbiotic traits were also revealed. In the future, combining information on modern patterns of genetic variation with data from ancient remains could help in deciphering the M. truncatula population history and evolution in much greater detail. This methodology may serve as a basis for other plant species analyses.

Lounes Chikhi, On some issues related to population structure, the Neolithic transition and ancient DNA

Ancient DNA (aDNA) is increasingly used to complement modern DNA data and to infer demographic scenarios for various species including humans. While aDNA is indeed providing new data that are likely to improve our understanding of the history of species, it also creates new challenges because the addition of aDNA samples from various periods requires explicit assumptions about the past. I will discuss how and why population structure is crucially affecting population genetics inference and will give an example related to the Neolithic transition in Europe, where considering or ignoring population structure can lead to different conclusions.

Inna Dubchak.  Mapping, Alignment and Visualization of aDNA

VISTA (http://genome.lbl.gov/vista/index.shtml) is a comprehensive suite of programs and databases for comparative analysis of genomic sequences. VISTA framework (Frazer, Pachter et al. 2004) for whole-genome DNA alignments has built-in algorithms that combine both global and local methods.  The constructed genome-wide pairwise and multiple alignments are accessible for browsing and various types of analysis through the several visualization tools.  These multiple visual exploration tools allow for the analysis of comparative data both at different levels of resolution. We are planning to work on refining these comparative tools for the analysis of newly sequenced aDNA.  Methods to implement will depend on the quality and other characteristics of sequence.

Eran Elhaik, Geographic Population Structure Prediction Algorithm

The search for a method that utilizes biological information to predict humans’ place of origin has occupied scientists for millennia. Over the past four decades, scientists have employed genetic data to address this question with limited success. Biogeographical algorithms using next-generation sequencing data achieved an accuracy of 700 km in Europe but were inaccurate elsewhere. Here we develop the Geographic Population Structure (GPS) algorithm and demonstrate its accuracy with three datasets using 40,000-130,000 SNPs to improve this accuracy. GPS placed 83% of worldwide-individuals in their country of origin. Applied to over 200 Sardinians villagers, GPS places a quarter of them in their villages and most of the remaining within 50km of their villages. The accuracy and power of GPS to infer the biogeography of worldwide-individuals down to their country or, in some cases, village, of origin, underscore the promise of admixture-based methods for biogeography and has ramifications for genetic ancestry testing. Applications for ancient DNA will be discussed.

G. Arun Kumar, Ancient migrations and peopling of India - A Genographic perspective

Indian populations are characterised by enormous geographic, cultural, linguistic, and genetic diversity. The mechanism of peopling of India has been a conundrum. In spite of prehistoric human habitation in the region, availability of ancient DNA from paleontological samples is scarce for varied reasons. Thus studies on the peopling of India focused primarily on the genetics of present day populations and their age calibrations by coalescent estimates. The Indian centre of the Genographic Project aimed to decipher the population structure and migratory pattern of Indian populations. Under its aegis a total of 12,040 males were sampled across India. All the samples were genotyped for a battery of 42 Y-SNPs, 17 Y-STRs, 22 mtDNA SNPs and mtDNA HVS-1 sequencing. Analysis of these uni-parental markers revealed a distinct correlation between linguistic variation and Y-chromosomal diversify but not with mtDNA diversity. The mtDNA showed a fluid distribution suggesting a shared maternal genepool of Indian populations. Presence of deep rooting Y-haplogroups such as C*-M130, C5- M, F*-M89 and mtDNA haplogroup M4, M31, M42, M64 and R30 showed a mid Palaeolithic (150 - 50 Kya) human habitation in the region. Population expansions during the Neolithic (10 Kya) were marked by agricultural expansion and spread of various language families in the subcontinent. The analysis of Indian samples using the Genochip deciphered a strong correlation of Y-chromosomal variation with genome-wide SNPs suggesting preponderant male mediated migrations shaping up the Indian gene pool.   

Hideaki Kanzawa-Kiriyama, Nuclear Genome Analysis of Ancient Japanese Archipelago Humans

The Jomon period, characterized by chord-marked potteries, lasted from ~16,000 to <3,000 years before present (YBP), and abundant human skeletal remains have been excavated from shell mounds and other sites throughout the Japanese Archipelago. However, their genetic origin and the relationships with modern populations are largely unknown. Here we determined 10% and 80% of the genomic DNA sequences from two Jomon individuals, excavated at Yugura cave site, Nagano, and Shitsukariabe cave site, Aomori, respectively, and compared their genome sequences with worldwide populations. We found a unique genetic position of the Jomon people who had diverged before the diversification of most of present-day East Eurasian populations including East Eurasian Islanders. This indicates that Jomon people were a basal population in East Eurasia and genetically isolated from other East Eurasians for long time. However, their genetic affinities to modern East Eurasians are uneven. The heterogeneity might be a hint to clarify human migration and gene flow in East Eurasia after the divergence of Jomon ancestors. 

Ryosuke Kimura, Genomic anthropology on the Ryukyuans

The goal of the ancient DNA study is to understand the evolutionary processes of organisms. The information that we can obtain from ancient DNA is not only phylogenetic relationships between ancient and present species but also physical and physiological characteristics of ancient species. To reconstruct the figure of ancient species from DNA, however, we need to know how the genetic factors are associated with phenotypes. In my research group, we are working on genome-wide association studies for physical traits such as facial, dental and hair morphology and skin color. The results of these studies will contribute to figure out ancient hominoid species and ancient human populations.

Sasha Mikheyev, A first look at nanopore sequencing

I will review the performance of an early iteration of Oxford Nanopore's handheald sequencer, based on my labs experience with re-sequencing the lambda phage genome, and amplicons from a snake venom gland transcriptome. The early version of the handhelp MinION sequencer has relatively low yield, and lots of insertion/deletion errors, or produces sequence seemingly without similarity to the template. In the course of a 36 hour run on the MinION it was possible to re-sequence the 48kb lambda phage reference at 16x coverage. Substantially larger projects would not be currently feasible using the MinION. Without increases in accuracy, which would allow for applications such as genome scaffolding and phasing, the current utility of the MinION may be limited. Library preparation requires access to a molecular lab, and is of similar complexity and cost to that of other next-generation sequencing platforms. The MinION is an exciting step in a new direction for single-molecule sequencing, though it will require dramatic decreases in error rates before it lives up to its promise. More broadly, I will compare advantages and disadvantages of nanopore sequencing technology to those of other chemistries.

Sasha Mikheyev, Museum samples reveal population genomic changes associated with a rapid evolutionary response by wild honey bees (Apis mellifera) to a novel parasite

Understanding genetic changes caused by novel parasites can reveal mechanisms of adaptation and genetic robustness. Here we examine the population genomic changes in wild colonies of honey bees (Apis mellifera) following exposure to the ectoparasitic mite (Varroa destructor), the main causative agent of high levels of colony mortality in Europe and North America. Using a novel PCR-free library preparation method, we sequenced whole genomes from museum specimens collected from a population of wild, mite-free colonies in 1977, and directly compared their allele frequencies to those of specimens collected from the same population in 2010, more than decade after the arrival of the mites. Although the density of colonies in this population is the same today as in the past, suggesting a tolerance to the parasite, the comparison reveals evidence of a drastic mitochondrial bottleneck, resulting in the loss of virtually all haplotypic diversity. In contrast, nuclear genetic diversity is unchanged. At least 232 genes, spread throughout the genome, show signs of selection, but there is no evidence of ‘hard’ selective sweeps, and corresponding islands of reduced genetic diversity, suggesting a complex polygenic response. These data show the robustness of honey bee populations that are genetically diverse and can evolve rapid tolerance to introduced diseases, while also maintaining much of the standing genetic variation. 

Morten Rasmussen, Ancient human genomes, a look into our past

With the advent of next generation sequencing many aspects of molecular biology studies have been revolutionized, this is also true for the field of ancient DNA. Moving from studies focused on the hyper-variable region of the mitochondrial DNA to complete mitochondrial sequences and ultimately complete ancient genomes. Although expensive and laborious, protocols for generating ancient genomes are readily available, from guidelines on lab design, extractions, library build all the way to sequencing. Looking only at our own species, several contributions have been made to describe the migrations of our ancestors and their historical relationships. As an example, I will show what ancient genomics have told us about the peopling of Australia and the Americas, using two ancient genomes we have generated. A lock of hair from a 100-year-old Aboriginal Australian gave a complete genome and revealed a deep split with Eurasian populations, indicating an early dispersal into Asia, dating back to >60kya, before Eurasien populations diverged. The oldest skeletal remains from the Americas are found at the Anzick Clovis site in Montana. They date to >12.5kya, and sequencing reveals equal relationship to all Southern Native Americans, suggesting membership of an ancestral population to all contemporary Native Americans.

 Victor Solovyev,  Advanced algorithms for next generation data processing

Dozens new algorithms have been developed for next-generation sequencing  (NGS) data processing and many of them actively applied in the frontier cancer genome research. The collaborative competitions (such as Assemblathon, Alignathon and RGASP) assessed the state of the art in genome assembling, read mapping, discovery alternative transcripts. They demonstrated the lack of consistency between software tools in terms of comparisons obtained on different data sets as well as relative to various metrics evaluating the quality of results suggesting that there is still much room for improvement.  We present here an advanced suit of robust bioinformatics tools for efficient analysis of large-scale NGS data.  It includes 1) OligoZip - de novo NGS reads assembler pipeline; 2) ReadsMap – a tool for RnaSeq spliced and non-spliced reads mapping and SNP identification; 3) TransSeq - program for de novo assembling alternative transcripts from short reads and gene expression quantification; 3) GenomeMatch – a tool to compare genome sequences. OligoZip assembler provides effective solutions to the following three tasks: 1) de novo reconstruction of genomic sequence; 2) reconstruction of sequence using a reference genome from the same or close organism; 3) mutation profiling and SNP discovery. The pipeline includes Adapter_trim and Quality_trim modules that remove adapter sequences from PE and MP reads and low quality read ends.  A new iterative procedure ReadsClean is developed to remove errors in the reads set as well as separate “clean” from “dirty” reads that can not be corrected due to the lack of relevant information. CleanReads can correct more than 99% of errors in Illumina generated reads. The assembling algorithm greatly benefits from using the set of clean reads. The algorithm consists of the following modules: creating seed sequences, contig extension, iterative contig joining (scaffolds building) and “holes” patching module. Using PE and MP reads it can assemble bacterial genomes in one or a few contigs. OligoZip pipeline can build eukaryotic chromosome sequences in a dozen contigs using sets of PE and MP simulated reads with less then 1% of errors.  ReadsMap pipeline has been significantly improved in speed of data processing and its accuracy to align RNASeq reads to the reference genome reached Sensitivity 0.99 and Specificity 0.96. TransSeq program assembles the RNA-Seq data into unique sequences of transcripts and generates full-length transcripts for a set of alternatively spliced isoforms.  The program demonstrates Sensitivity 0.97 and Specificity 0.96 in identifying known RNA transcripts on C. elegans test data. We incorporate this program into our Fgenesh++ gene prediction pipeline and demonstrated the benefits of using publicly available RNAseq data of Rice transcriptome to re-annotate the Rice genome chromosome sequences.  We produced a new rice genome annotation that includes alternatively spliced isoforms as well as non-coding 5’- and 3’- transcript regions that were absent in ab initio predicted genes.