Algorithms for Ecological and Evolutionary Genomics (Gene Myers)
Unit Research Overview
Our unit is essential a computer science group focused on developing fundamental algorithmic methods for core problems in bioinformatics that we use in software applications we build for our collaborative explorations into topics in genomics. We are currently galvanized by the technological goal of perfect, “telomere to telomere” genome reconstructions and the development of a library of such genomes for every species of life extant on the planet. The Myers group is closely affiliated with the OIST units of Nick Luscombe and Dan Rokshar that have deep expertise and skill in similar areas.
1. Fundamental Algorithms for Bioinformatics
One might think that the best algorithms for core problems such as sorting or aligning sequences leave no room for improvement, but this is not so especially when one considers that one can take advantage of the context in which the core problem is being solved. We create new, faster, more memory efficient methods for problems such as sequence alignment, graph search, and sorting and searching sequences to name a few central tasks in bioinformatics.
2. DNA Sequencing & Assembly
Reconstructing the sequence of a genome from raw data sequence reads, called DNA assembly, has been a long standing problem in bioinformatics. It is a multi-faceted problem involving all of the core problems mentioned above. The creative design possibilities also depend heavily on the characteristics of the source data and the availability and nature of additional long range information. For example, an assembler for short sequencing reads with a .5% error rate is generally very different from one using long reads with a 10% error rate, and often one also has available restriction maps or Hi-C data that give information about the genome at the scale of 100’s of kilobases. The problem of building a “perfect” assembler thus involves not only the mathematical problem of best leveraging the input information, but also the design of what information to collect and how much of each type to collect. We are exploring both the space of protocol combinations and assembly algorithms to bring us closer to the goal of economically producing perfect assemblies.
3. Eco & Evo Sequencing “Driver Projects”
To test our methods and develop robust applications, we seek to work with biologists at OIST on what we call “driver projects” that entail producing state-of-the-art reconstructions for a coherent collection of species. For example, with the Satoh group, we hope to sequence a collection of starfish and the corals they feed on to understand why one of the starfish is destroying the reefs it inhabits. With the Economo group, we aim to sequence a collection of ants that will help elucidate the genomic basis of the evolution of their mandible which has been adapted to carry out a wide range of very specific functions. With the Bourguignon group, we hope to contribute to producing a reference set of 50 genomes covering the sub-clade of termites that consists of approximately 3000 species.
4. External Collaborations
In addition, our group is involved in several international consortia to produce even larger collections of genomes. In particular, the Myers unit is a partner in the Bat1K Project (1.4K bats), Vertebrate Genome Project (70K vertebrates), and the Darwin Tree of Life project (the flora and fauna of the British isles, 70K species). For the Bat1K, we are the group sequencing and assembling a representative from each of the 21 families of bats. For the VGP, we are sequencing about 15% of the representatives from each of the 260 orders of vertebrates. For the DToL, Myers is an adjunct faculty member working close with members of the DToL team including Richard Durbin, Mara Lawniczak, and Mark Blaxter.