Workshop: Deep Kernel Learning for Information Extraction from Cancer Pathology Reports
Abstract: Cancer pathology reports comprise a rich source of data for surveilling cancer incidents and tracking cancer trends across the United States. Cancer registries manually extract key pieces of information from these reports including tumor site, histology, laterality, behavior, grade, and metastatic status. Automating this task is critical for an efficient and scalable processing pipeline of these reports. Deep neural networks have recently been shown to perform well on this information extraction task by casting it as a document classification problem. However, neural networks are prone to overfitting in low-sample regimes and are unable to quantify their own uncertainty. Deep kernel learning (DKL) has recently emerged as a simple and scalable paradigm to hybridize deep neural networks and Bayesian models, which may help to remedy some of these shortcomings of neural networks. A DKL model is obtained by feeding a neural network feature extractor into a Gaussian process (GP) classifier and training the resulting model with gradient descent in a variational inference framework. In this project, we build a DKL model with a shallow-wide convolutional neural network (CNN) feature extractor and use it to extract primary tumor site information from a dataset of de-identified cancer pathology reports. As far as we are aware, this marks the first application of DKL to document classification. Our DKL model outperforms the state-of-the-art CNN on this dataset. We also show that pretraining a CNN with the weights of a DKL model boosts performance, suggesting that DKL is beneficial not just because of GP inference at test time but also because DKL is able to extract better feature representations from the pathology reports through Bayesian training. We conclude that DKL has the potential to boost the performance of neural networks for information extraction on pathology reports while requiring little modification of the original network architecture, and that DKL can offer a path forward to develop scalable deep Bayesian models for such tasks.