SMBP Group Meeting: Luke Evans

America/New_York
3rd Floor Classroom (162 5th Avenue )

3rd Floor Classroom

162 5th Avenue

Description
Speaker: Luke Evans 

Topic: Why Counting Particles Can Give Wrong Probabilities in Cryo-EM

Abstract: This is a research update heavily related to the FI-developed techniques of https://pubs.acs.org/doi/full/10.1021/acs.jpcb.3c01087, and the ethos of the Flatiron Institute Heterogeneity Challenge.
 
Likely this will be more informal, but I'm putting the title (above) and abstract (below) based on wishful thinking for future presentations of this work.
 
A dataset of cryo-EM single-particle images can be utilized not only to estimate conformational changes but also to determine probability distributions (i.e., populations of states of a biomolecule). These quantitative probability estimates are crucial for understanding biological function, as they enable us to estimate differences in the free energy of states and activation barriers under different environments.
Recovering the true probability distribution of conformations is an inverse problem: given a dataset of cryo-EM particles (a probability distribution over possible images), find the conformational probability distribution that generated the dataset. This inverse problem is extremely challenging and typically ill-posed, as the cryo-EM experiment involves numerous sources of uncertainty, such as unknown conformations, projection directions, and extremely high levels of noise. Although the inverse problem is for matching distributions ("global'' properties of the data), most methods to solve it and related statistics problems rely, at least in part, on assigning each individual image y to a value z in a latent space. This could be a discrete latent space of possible "classes" (e.g, 3D classification techniques), or a continuous latent space (e.g Variational Autoencoder-like techniques such as CRYODRGN).

We first showcase under which realistic scenarios counting particles (histogramming) in cryo-EM can lead to inaccurate populations estimates for discrete classification, and highlight the many scenarios where particle counting can fail. Then, we show that, also under realistic conditions, latent-space distributions (obtained by state-of-the-art methods) do not always coincide with the true underlying conformational probability for the case of continuous heterogeneity (such as reported by the Flatiron Institute Heterogeneity challenge). For both cases, we demonstrate that explicitly inferring the source probability distribution, rather than relying solely on particle counts, resolves the issues highlighted above.    
The agenda of this meeting is empty