Speaker: Dr. Katherine S. Pollard (University of California, San Francisco)
Title: Computational Challenges in a densely sequenced tree of life
Genome sequencing and assembly have exploded since 2015. Today, many lineages contain closely related species, as well as species with multiple diverse
genome sequences. Having more genomes seems like a good thing for studying ecology and evolution across the tree of life. However, the workhorse algorithm
for genomic studies, sequence alignment, is breaking down in terms of both
computational efficiency and accuracy. This challenge has been particularly
evident to us in our work using metagenomic sequencing to study the genetics of
bacterial species in complex communities, such as the human gut. I will present
results showing that genome redundancy, reference bias, and cross-mapping are
prevalent sources of error in this context. Then, I will discuss several actionable
and aspirational solutions, including several new bioinformatics tools from our
lab. This work demonstrates that efficient algorithms and data structures are
essential to maintain access to genomic and metagenomic data science for
researchers without massive high-performance computing resources and to
ensure that read mapping is accurate on a densely sequenced tree of life.