Dr. Chaofan Chen, University of Maine School of Computing and Information Science:
“Interpretable Case-based Deep Learning for DNA Classification”
The use of deep neural networks has become increasingly popular, especially for computer vision and natural language processing tasks. In the genomics domain, deep learning has also received increasing attention, and has been used for classifying DNA sequences, predicting the effect of non-coding regions, and associating genes with phenotypes. Despite the impressive accuracy achieved by deep learning models in genomics tasks, these models are often called “black boxes” because they generally suffer from a lack of interpretability, and we do not know how or why they reached certain predictions. In this talk, I will discuss my research group’s recent effort to develop an interpretable, case-based deep learning model for DNA classification. Our approach is based on a prototypical part network (ProtoPNet), which was originally designed for image classification. A ProtoPNet learns a set of prototypical image features for various classes during training, and classifies an image by relating parts of the image with learned prototypical features of various classes during inference. In a similar vein, our interpretable deep model for DNA classification learns a set of prototypical subsequences that characterize various species in the training set, and classifies a DNA sequence by relating parts of the sequence with learned prototypical subsequences of various species during inference. In this way, our model can explain why a DNA sequence is classified into a particular species (because it contains subsequences that are similar to those prototypical of that species). Our experiments show that our model is able to achieve competitive accuracy as “black-box” deep learning models and traditional machine learning models trained on counts of k-mers, on the task of classifying 12S meta-barcoding sequences.
Dr. Chaofan Chen is an Assistant Professor of Computer Science at the University of Maine. His research involves the design of interpretable machine learning models that can be understood (“interpreted”) by human beings. In particular, Chen is interested in developing new techniques to enhance the interpretability and transparency of machine learning models, especially deep learning models, and in applying such techniques to healthcare, finance, and other application domains, where high-stakes decisions are made and interpretability is key for whether one can trust the predictions made by machine learning models.
co-hosted by MCGE (Maine Center for Genetics in the Environment) and SBE’s Mike Kinnison