John Hewitt
Visiting Researcher, Google DeepMind.
Assistant. Professor of Computer Science, Columbia University.
(Fall 2025–)
johnhew [at] stanford.edu
I am a researcher interested in developing neural language systems, deeply understanding them, and precisely controlling them, for the sake of peoples’ access to information and useful tools. Feel free to look me up on Google Scholar or Twitter, or take my CV.
Join my lab @ Columbia
I am joining Columbia Computer Science as an assistant professor. So, I’m hiring my first 1-2 Computer Science PhD students this cycle!
The goal of my lab is to deeply understand neural systems in order to ensure their efficacy and safety.
Our Research
Being opportunistic. The directions below I’ve tried to distill from my interests, but if you look at my published works, you’ll note that it’s tough to find a common theme. I am much more interested in working on problems that are deeply interesting even if it means meandering across topics than sticking to a specific pre-specified set of directions. We don’t know how to understand neural systems, and I think as a lab we will err more towards exploration of new or understudied directions than towards exploitation of known promising directions. So, if you want to work on something and it doesn’t fit in the topics below, it’s quite possible we can work on it!
Discovering and leveraging structure for control. We hypothesize that existing large-scale neural systems (large language models, other foundation models) have learned high-level structures about the world. We further hypothesize that discovering and leveraging these structures can provide fine-grained and reliable control of these systems in ways that are hard to achieve via, e.g., finetuning.
Behavioral characterization. The best way to begin to understand a system is to interact with it in as many ways as possible. To observe and characterize its behaviors. We ask questions like, what happens when I finetune models like X on distributions like Y? How do the resulting models behave? Or, in what ways do models fail to make coherent use of long contexts?
Unsupervised training for modular systems. All systems of sufficient complexity develop modular, specialized structure in order to operate efficiently — from organisms to organizations. Modular structure is a key condition for one’s ability to make surgical, targeted changes to a system. The specific modular structures of each system are the result of optimization pressures and available resources — that is, they are learned ‘unsupervisedly’ from interaction in their environments. We aim to develop algorithms for improved unsupervised modularization in neural systems like language models.
Advising Philosophy
Every student needs something different from an advisor. Some students want more guidance; others want to be left more to their own devices. All students struggle in different ways. My goal as your advisor is to engage deeply with your research and career goals and help you develop into an excellent scientist while maintaining a healthy lifestyle. Here are some kinds of interaction you can expect if you are one of my PhD students:
- Weekly meetings. We will have a ~1hr meeting every week. We can split it into two 30min meetings. We’ll discuss your research and future plans, talk about technical blockers, how you’re feeling about how things are going, whatever’s useful.
- Whiteboard time. Weekly meetings aren’t the best way to have in-depth technical discussions about, e.g., how to formalize a problem, how to make progress on a tricky bit of math. I’ll make myself available for longer discussions (often in front of a whiteboard) where we try to nail down details and make technical progress.
- Paper feedback. All advisors give paper feedback. But I think it’s some of the most crucial feedback you receive as a PhD student. The rest of the research community largely understands you and your work through your papers. I think the single most important thing I learned from my PhD is what makes for strong technical writing — not just for the papers themselves, but for how that writing helps shape how I think about research. I’ll give in-depth comments on your papers that require you to go in and make the edits yourself.
- Lab culture. From lab socials to lab snacks to helping make positive interactions between labmates, to making sure your lab space works for you.
- Amplification. You need to develop professional connections in your PhD. To professors, industry scientists, other PhD students. You need introductions, chances to speak about your work, and I’ll help you with these things to the extent that I can.
What kind of students am I looking for?
At a high level, I am hoping to hire students with potential to do creative, independent research and contribute positively to a lab culture. I will attempt to hire students who are kind and curious. Kindness is crucial for a healthy lab, because toxicity and unkindness keep students from happiness and productivity. Curiosity is crucial for all research. And in the kinds of research I want to do, it’s often unclear what questions we should be asking to understand or improve a model. Usually it’s unclear what understanding even means. Without a natural curiosity, we’d be lost. In all of this, I aim to hire a diverse group of students, valuing a richness of perspectives. Some level of technical competence is necessary as well, though gaps can be filled in and much will be learned during the PhD.
How can we work together?
If you want to do a PhD, you can apply through Columbia Computer Science, and mention me in your application. Feel free to reach out via email, but I often am unable to respond to such emails. In the applications due December 2024, for PhD students starting in Fall 2025, I hope to hire (i.e., admit and then matriculate) roughly two students.
If you’re an existing PhD student at Columbia, email me and I’ll do my absolute best to get you on my calendar. I’m potentially happy to collaborate with you and your advisor, but if I don’t think I can be of much use for a specific project or direction, I’ll try to say so early on.
If you are an undergraduate or master’s student at Columbia, you can email me. I hope to work with 2-4 undergrads/master’s students at any one time. As my lab grows, you may work more directly with a PhD student of mine. We’ll see. It helps, but is not required, for you to have taken NLP classes. If you’ve never programmed in Python before, it’s probably a good idea to learn before you reach out.
If you are a student at another university, it’s unlikely that I’ll be able to mentor you through a research project, but not impossible. This is most likely to work if you have specific and highly related interests to some work of mine that you can concisely describe in an email. Don’t spend too much time on it, as again it’s unlikely we’ll be able to work together, but feel free to send something if you think there’s a particular connection.
More about me
Before Google and Columbia, I got my PhD at Stanford. I’m grateful to have been co-advised by Chris Manning and Percy Liang, and to have been supported by an NSF Graduate Research Fellowship. Before that, I did my undergrad at the University of Pennsylvania.
-
-
Publications
Publications
2024
Instruction Following without Instruction Tuning.
John Hewitt, Nelson F. Liu, Christopher D. Manning, Percy Liang.
ArXiv.
(pdf) (code) (blog)
Model Editing with Canonical Examples.
John Hewitt, Sarah Chen, Lanruo Lora Xi, Edward Adams, Percy Liang, Christopher D. Manning.
ArXiv.
(pdf) (code)
A non-archival version won Honorable Mention for Best Paper at the R0-FoMo Workshop at NeurIPS 2023.
Closing the Curious Case of Neural Text Degeneration.
Matthew Finlayson, John Hewitt, Alexander Koller, Swabha Swayamdipta, Ashish Sabharwal.
ICLR 2024.
(pdf) (code)
2023
Backpack Language Models.
John Hewitt, John Thickstun, Christopher D. Manning, Percy Liang.
ACL 2023 (long papers). (Outstanding Paper Award).
(pdf) (blog) (code)
(backpackmodels.science)
Character-level Chinese Backpack Language Models.
Hao Sun, John Hewitt.
BlackBoxNLP 2023.
(pdf) (code)
Lost in the Middle: How Language Models Use Long Contexts.
Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, Percy Liang.
TACL 2023.
(pdf) (code)
2022
Truncation Sampling as Language Model Desmoothing.
John Hewitt, Christopher D. Manning, Percy Liang.
Findings of EMNLP 2022 (long papers).
(pdf) (blog) (code)
JamPatoisNLI: A Jamaican Patois Natural Language Inference Dataset.
Ruth-Ann Armstrong, John Hewitt, Christopher D. Manning.
Findings of EMNLP 2022 (long papers).
(pdf) (blog) (talk) (dataset) (code) (Vox video)
2021
Conditional probing: measuring usable information beyond a baseline.
John Hewitt, Kawin Ethayarajh, Percy Liang, Christopher D. Manning.
EMNLP 2021 (short papers).
(pdf) (blog) (code) (codalab)
On the Opportunities and Risks of Foundation Models.
Bommasani et al (+100 authors). John Hewitt, Co-lead; Interpretability section.
whitepaper.
(pdf)
Probing artificial neural networks: Insights from neuroscience.
Anna Ivanova, John Hewitt, Noga Zaslavsky.
Brain2AI 2021.
(pdf)
Refining Targeted Syntactic Evaluation of Language Models.
Benjamin Newman, Kai-Siang Ang, Julia Gong, John Hewitt.
NAACL 2021 (short papers).
(pdf) (code)
2020
RNNs can generate bounded hierarchical languages with optimal memory.
John Hewitt, Michael Hahn, Surya Ganguli, Percy Liang, Christopher D. Manning.
EMNLP 2020 (long papers).
(pdf) (blog) (code:analytic) (code:learning) (codalab)
The EOS Decision and Length Extrapolation.
Benjamin Newman, John Hewitt, Percy Liang, Christopher D. Manning
BlackBoxNLP 2020. (Outstanding Paper Award).
(pdf) (code)
Emergent Linguistic Structure in Artificial Neural Networks Trained by Self-Supervision.
Christopher D. Manning, Kevin Clark, John Hewitt, Urvashi Khandelwal, Omer Levy
Proceedings of the National Academy of Sciences. 2020.
(pdf)
Finding Universal Grammatical Relations in Multilingual BERT.
Ethan A. Chi, John Hewitt and Christopher D. Manning.
ACL 2020 (long papers).
(pdf) (bib) (code) (viz)
2019
Designing and Interpreting Probes with Control Tasks.
John Hewitt and Percy Liang.
EMNLP 2019 (long papers). (Runner Up Best Paper Award).
(pdf) (bib) (blog) (code) (codalab) (slides) (talk).
A Structual Probe for Finding Syntax in Word Representations.
John Hewitt and Christopher D. Manning.
NAACL 2019 (short papers).
(pdf) (bib) (blog) (code) (nlp highlights podcast) (slides) (talk).
Simple, Fast, Accurate Intent Classification and Slot Labeling for Goal-Oriented Dialogue Systems.
Arshit Gupta*, John Hewitt* and Katrin Kirchhoff.
SIGDIAL 2019.
(pdf)
*: Equal contribution; authors listed alphabetically2018
A Distributional and Orthographic Aggregation Model for English Derivational Morphology.
Daniel Deutsch*, John Hewitt* and Dan Roth.
ACL 2018 (long papers).
(pdf)
*: Equal contribution; authors listed alphabeticallyLearning Translations via Images with a Massively Multilingual Image Dataset.
John Hewitt*, Daphne Ippolito*, Brendan Callahan, Reno Kriz, Derry Tanti Wijaya and Chris Callison-Burch.
ACL 2018 (long papers).
(pdf)
*: Equal contribution; authors listed alphabeticallyXNMT: The eXtensible Neural Machine Translation Toolkit.
Graham Neubig, Matthias Sperber, Xinyi Wang, Matthieu Felix, Austin Matthews, Sarguna Padmanabhan, Ye Qi, Devendra Singh Sachan, Philip Arthur, Pierre Godard, John Hewitt, Rachid Riad, and Liming Wang.
AMTA 2018.
(pdf)2017
Learning Translations via Matrix Completion.
Derry Tanti Wijaya, Brendan Callahan, John Hewitt , Xiao Ling, Marianna Apidianaki, and Chris Callison-Burch.
EMNLP 2017 (long papers).
(pdf)
2016
Automatic Construction of Morphologically-Motivated Translation Models for Highly Inflected Low-Resource Languages.
John Hewitt, Matt Post, David Yarowsky.
AMTA 2016.
(pdf)Invited Talks
Instruction Following without Instruction Tuning.
Deep Learning: Classics and Trends (ML Collective). November, 2024.Instruction Following without Instruction Tuning.
Bay Area Language Interest Group (Bayli). November, 2024.Instruction Following without Instruction Tuning.
University of Washington. November, 2024.Instruction Following without Instruction Tuning.
University of Pennsylvania. November, 2024.Understanding Language Models through Discovery and by Design.
UMichigan. March, 2024.Understanding Language Models through Discovery and by Design.
Northwestern. March, 2024.Understanding Language Models through Discovery and by Design.
Harvard. February, 2024.Understanding Language Models through Discovery and by Design.
NYU. February, 2024.Understanding Language Models through Discovery and by Design.
Columbia. February, 2024.Backpack Language Models.
Apple. August 7, 2023.Backpack Language Models.
Princeton NLP. August 4, 2023.Backpack Language Models.
Columbia NLP. July 19, 2023.Backpack Language Models.
Cornell Tech NLP. July 18, 2023.Backpack Language Models.
NYU. July 17, 2023.Backpack Language Models.
Anthropic. May 10, 2023.Backpack Language Models.
Schütze Lab, LMU Munich. May 1, 2023.Backpack Language Models.
Rycolab, ETH Zurich. April 27, 2023.Surviving Grad School.
ACL Year-Round Mentorship Panel. July 11, 2022.A Natural Language Processing perspective on supervised analysis of neural representations.
EvLab, MIT. December 2, 2020.The Unreasonable Syntactic Expressivity of RNNs.
USC ISI NLP Seminar. (video) November 5, 2020.Language Probes as V-information Estimators.
NLP with Friends. September 9, 2020.Probing Neural NLP: Ideas and Problems.
Berkeley NLP Seminar. November 18, 2019.Emergent Linguistic Structure in Neural NLP.
Amazon AI. July 25, 2019.A Structural Probe for Finding Syntax in Word Representations.
NLP Highlights Podcast. May, 2019.Abstracts
RNNs can generate bounded hierarchical languages with optimal memory.
John Hewitt, Michael Hahn, Surya Ganguli, Percy Liang, Christopher D. Manning
2020 Conference on the Mathematical Theory of Deep Learning (abstracts).
Semantic Bootstrapping in Frames: A Computational Model of Syntactic Category Acquisition.
John Hewitt, Jordan Kodner, Mitch Marcus, and Charles Yang.
Conference of the Cognitive Science Society (CogSci), (member posters) 2017. (pdf) (abstract)Patents
Capturing Rich Response Relationships with Small-Data Neural Networks.
John Hewitt.
US Patent App 15/841,963. December 2017. (granted). (application) - Blog
-
Projects
-
Self-Attention and Transformers lecture notes
- I wrote a lecture on Transformers in my role as Head TA for Stanford’s CS 224N: Natural Language Processing with Deep Learning in 2021. The updated slides are available, as is a recording on YouTube. In 2023, I updated the lecture (which had also been updated by Anna Goldie in 2022). Along with the lecture, in 2023 I wrote brand new lecture notes.
-
Pretraining lecture
- I wrote a lecture on Pretraining for the same course! The 2021 version is available on YouTube.
-
Model analysis and explanation lecture
- I wrote a lecture on analysis and explanation of NLP models for the same course! The 2021 version is available on YouTube
-
-
About
Tidbits
This talk by Rajiv Gandhi, to whom I am grateful. For you if you think, like I used to, that research–or any success in STEM–is out of your reach.
Scott Aaronson’s old note on frameworks for reasoning about large numbers, for enjoyment
Kevin Knight’s note on unix commands, to help you with your bash skills
The Fundamental Whiteboard Difficulty (Scott Aaronson):
I figured that chalk has its problems—it breaks, the dust gets all over—but I could live with them, much more than I could live with the Fundamental Whiteboard Difficulty, of all the available markers always being dry whenever you want to explain anything.
I highly suggest Arch Linux for its configurability and the educational experience it provides…
Contact
Take my school email johnhew@stanford, and predict the TLD using your internal knowledge base.
A bit of history
I was absolutely destroyed by my first year of computer science undergraduate studies; bad grades, late nights, a good amount of crying. My academic advisor Professor Max Mintz asked me what I was doing at Penn, as I was certainly no good at getting good grades. He cared deeply about his students. Out of possible futures, I didn’t love software engineering, I wasn’t good enough at math to be a quant, and though I was interested in law, my GPA was already too low to get into a good law school (even if I got straight As for the rest of college.)
I tried for a while to explore research. Professor Ani Nenkova was generous enough to spend time mentoring me briefly as a freshman on a project that I, to my discredit, brought nowhere, and eventually dropped. Still, her generosity, and that of Professor Rajiv Gandhi, who believed in me even as I was barely passing his first-year algorithms and discrete mathematics courses, led them to write me letters of recommendation for the Johns Hopkins University Summer Research Experiences program. I am grateful to both of them.
Not expecting to get into the Hopkins program, I applied to the startup of a Penn alum associated with STWing, an amazing nerdy community; I was surprised to get an offer, and a chance to spend the summer in Palo Alto, which I’d recently heard was a hotspot of the technology world. Well, as luck would have it, I got into the Hopkins program, but had already accepted the startup role, so I politely declined Hopkins and prepped for the summer. Just a bit before the summer was to start, I got a call from the startup founder saying they’d run out of runway! No startup, no internship.
At Max Mintz’ recommendation, I called the cell phone of the person I’d have worked with at JHU—Professor David Yarowsky—who for whatever reason had his cell number on his website! To my continual amazement, David picked up, and I sheepishly asked if he still had room for me. He did, and I spent the summer studying inflectional morphology for low-resource machine translation. I loved it, and unlike my first year of undergrad, I wasn’t bad at it.
Later, Dr. Matt Post agreed to mentor me on running machine translation experiments, and spent an enormous amount of time teaching me as I tried to write my first research paper. The connection with David and Matt led me to Professor Chris Callison-Burch, who kindly allowed me a space in his lab. He mentored me for the rest of my undergraduate research, where I got to work with Professor Charles Yang as well.
By the end of my undergraduate career, I’d mostly caught up with my peers in terms of performance in coursework, but I’d never been the “exceptional” undergrad in courses that some would suggest would be good at research. It was only through the generosity and time of a large number of people that I’d had the chance to do research and thrive.