Data-driven insights into cancer: from machine learning methods to biological discoveries.
The recent deluge of genomics data has transformed biological research from a field limited by data acquisition into one limited by data interpretation. The potential of many rich but heterogeneous genotype and phenotype data sources remains relatively untapped, largely because we lack computational tools to appropriately combine and integrate data across studies. I will present two computational frameworks that fill this critical gap. PLATYPUS is a semi-supervised machine learning framework which uses multiple views to jointly predict outcomes, incorporating disparate data sources and biological priors. I apply PLATYPUS to drug sensitivity prediction in a large cancer cell line database to highlight the strengths of this learning approach in both performance and identification of drivers of drug response. The second method, FREYA, is a comparative genomics statistical framework that models disease across multiple species. I RNA-profiled canine mammary tumors to model molecular changes in human breast cancer and demonstrate, using FREYA, that dog-derived cancer signatures are predictive of survival for human breast cancer patients.