The computations required for deep learning research have been doubling every few months, resulting in an estimated 300,000x increase from 2012 to 2018. These computations have a surprisingly large carbon footprint. Moreover, the financial cost of the computations can make it difficult for researchers, in particular those from emerging economies, to engage in deep learning research, as well as for customers to use this technology for their applications.
In the first part of this talk I will demonstrate that test-set performance scores alone are insufficient for drawing accurate conclusions about which model performs best; I will show that simply pouring more resources into hyperparameter and/or random seed tuning can lead to massive improvements, e.g., making BERT performance comparable to models like XLNet and RoBERTa. I will then present a novel technique for improved reporting: expected validation performance as a function of computation budget (e.g., the number of hyperparameter search trials). Our approach supports a fairer comparison across models, and allows to estimate the amount of computation required to obtain a given accuracy.
In the second part I will present a method to substantially reduce the inference cost of NLP models. Our method modifies the BERT fine-tuning process, and allows, during inference, for early (and fast) “exit” from neural network calculations for simple instances and late (and accurate) exit for hard instances. Our method presents a favorable speed/accuracy tradeoff on several datasets, producing models which are up to four times faster than the state of the art, while preserving their accuracy. Moreover, our method requires no additional training resources (in either time or parameters) compared to the baseline BERT model.
This is joint work with Dallas Card, Jesse Dodge, Ali Farhadi, Suchin Gururangan, Hannaneh Hajishirzi, Gabriel Ilharco, Oren Etzioni, Noah A. Smith, Gabi Stanovsky and Swabha Swayamdipta.
Roy Schwartz is a research scientist at the Allen institute for AI and the University of Washington. Roy's research focuses on improving deep learning models for natural language processing, as well as making them more efficient, by gaining mathematical and linguistic understanding of these models. He received his Ph.D. in Computer Science from the Hebrew University of Jerusalem. He will be rejoining the school of Computer Science at the Hebrew University as an assistant professor in the fall of 2020