John Hewitt

johnhew [at] stanford.edu

Hi! I’m a sixth-and-final year PhD student in computer science at Stanford University.

I’m on the faculty job market!

I conduct research in natural language processing and machine learning. I am grateful to be co-advised by Chris Manning and Percy Liang, and to have been supported by an NSF Graduate Research Fellowship.

I design tools to both discover the characteristics of language models’ neural computation, what they’ve learned, and how they represent that knowledge, ^{[HL’19](runner-up best paper)[HM’19][HELM’21]}, and design new language models that are more decomposable, more understandable, more auditable, more fixable ^{[HTML’23 (outstanding paper)][HCXALM’24]}.

I have been the Head TA (Co-Instructor) of Stanford’s CS224n: Natural Language Processing with Deep Learning (in 2021 and 2023.) See my introductory NLP notes, my lectures on Self-Attention and Transformers (notes), and Pretraining.

Feel free to look me up on Google Scholar or Twitter, or take my CV.

Publications

Publications

2024

Model Editing with Canonical Examples
John Hewitt, Sarah Chen, Lanruo Lora Xi, Edward Adams, Percy Liang, Christopher D. Manning.
ArXiv.
(pdf) (code)
A non-archival version won Honorable Mention for Best Paper at the R0-FoMo Workshop at NeurIPS 2023.

Closing the Curious Case of Neural Text Degeneration.
Matthew Finlayson, John Hewitt, Alexander Koller, Swabha Swayamdipta, Ashish Sabharwal.
ICLR 2024.
(pdf) (code)

2023

Backpack Language Models.
John Hewitt, John Thickstun, Christopher D. Manning, Percy Liang.
ACL 2023 (long papers). (Outstanding Paper Award).
(pdf) (blog) (code)
(backpackmodels.science)

Character-level Chinese Backpack Language Models.
Hao Sun, John Hewitt.
BlackBoxNLP 2023.
(pdf) (code)

Lost in the Middle: How Language Models Use Long Contexts.
Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, Percy Liang.
TACL 2023.
(pdf) (code)

2022

Truncation Sampling as Language Model Desmoothing.
John Hewitt, Christopher D. Manning, Percy Liang.
Findings of EMNLP 2022 (long papers).
(pdf) (blog) (code)

JamPatoisNLI: A Jamaican Patois Natural Language Inference Dataset.
Ruth-Ann Armstrong, John Hewitt, Christopher D. Manning.
Findings of EMNLP 2022 (long papers).
(pdf) (blog) (talk) (dataset) (code) (Vox video)

2021

Conditional probing: measuring usable information beyond a baseline.
John Hewitt, Kawin Ethayarajh, Percy Liang, Christopher D. Manning.
EMNLP 2021 (short papers).
(pdf) (blog) (code) (codalab)

On the Opportunities and Risks of Foundation Models.
Bommasani et al (+100 authors). John Hewitt, Co-lead; Interpretability section.
whitepaper.
(pdf)

Probing artificial neural networks: Insights from neuroscience.
Anna Ivanova, John Hewitt, Noga Zaslavsky.
Brain2AI 2021.
(pdf)

Refining Targeted Syntactic Evaluation of Language Models.
Benjamin Newman, Kai-Siang Ang, Julia Gong, John Hewitt.
NAACL 2021 (short papers).
(pdf) (code)

2020

RNNs can generate bounded hierarchical languages with optimal memory.
John Hewitt, Michael Hahn, Surya Ganguli, Percy Liang, Christopher D. Manning.
EMNLP 2020 (long papers).
(pdf) (blog) (code:analytic) (code:learning) (codalab)

The EOS Decision and Length Extrapolation.
Benjamin Newman, John Hewitt, Percy Liang, Christopher D. Manning
BlackBoxNLP 2020. (Outstanding Paper Award).
(pdf) (code)

Emergent Linguistic Structure in Artificial Neural Networks Trained by Self-Supervision.
Christopher D. Manning, Kevin Clark, John Hewitt, Urvashi Khandelwal, Omer Levy
Proceedings of the National Academy of Sciences. 2020.
(pdf)

Finding Universal Grammatical Relations in Multilingual BERT.
Ethan A. Chi, John Hewitt and Christopher D. Manning.
ACL 2020 (long papers).
(pdf) (bib) (code) (viz)

2019

Designing and Interpreting Probes with Control Tasks.
John Hewitt and Percy Liang.
EMNLP 2019 (long papers). (Runner Up Best Paper Award).
(pdf) (bib) (blog) (code) (codalab) (slides) (talk).

A Structual Probe for Finding Syntax in Word Representations.
John Hewitt and Christopher D. Manning.
NAACL 2019 (short papers).
(pdf) (bib) (blog) (code) (nlp highlights podcast) (slides) (talk).

Simple, Fast, Accurate Intent Classification and Slot Labeling for Goal-Oriented Dialogue Systems.
Arshit Gupta*, John Hewitt* and Katrin Kirchhoff.
SIGDIAL 2019.
(pdf)
*: Equal contribution; authors listed alphabetically

2018

A Distributional and Orthographic Aggregation Model for English Derivational Morphology.
Daniel Deutsch*, John Hewitt* and Dan Roth.
ACL 2018 (long papers).
(pdf)
*: Equal contribution; authors listed alphabetically

Learning Translations via Images with a Massively Multilingual Image Dataset.
John Hewitt*, Daphne Ippolito*, Brendan Callahan, Reno Kriz, Derry Tanti Wijaya and Chris Callison-Burch.
ACL 2018 (long papers).
(pdf)
*: Equal contribution; authors listed alphabetically

XNMT: The eXtensible Neural Machine Translation Toolkit.
Graham Neubig, Matthias Sperber, Xinyi Wang, Matthieu Felix, Austin Matthews, Sarguna Padmanabhan, Ye Qi, Devendra Singh Sachan, Philip Arthur, Pierre Godard, John Hewitt, Rachid Riad, and Liming Wang.
AMTA 2018.
(pdf)

2017

Learning Translations via Matrix Completion.
Derry Tanti Wijaya, Brendan Callahan, John Hewitt , Xiao Ling, Marianna Apidianaki, and Chris Callison-Burch.
EMNLP 2017 (long papers).
(pdf)

2016

Automatic Construction of Morphologically-Motivated Translation Models for Highly Inflected Low-Resource Languages.
John Hewitt, Matt Post, David Yarowsky.
AMTA 2016.
(pdf)

Invited Talks

Backpack Language Models.
Apple. August 7, 2023.

Backpack Language Models.
Princeton NLP. August 4, 2023.

Backpack Language Models.
Columbia NLP. July 19, 2023.

Backpack Language Models.
Cornell Tech NLP. July 18, 2023.

Backpack Language Models.
NYU. July 17, 2023.

Backpack Language Models.
Anthropic. May 10, 2023.

Backpack Language Models.
Schütze Lab, LMU Munich. May 1, 2023.

Backpack Language Models.
Rycolab, ETH Zurich. April 27, 2023.

Surviving Grad School.
ACL Year-Round Mentorship Panel. July 11, 2022.

A Natural Language Processing perspective on supervised analysis of neural representations.
EvLab, MIT. December 2, 2020.

The Unreasonable Syntactic Expressivity of RNNs.
USC ISI NLP Seminar. (video) November 5, 2020.

Language Probes as V-information Estimators.
NLP with Friends. September 9, 2020.

Probing Neural NLP: Ideas and Problems.
Berkeley NLP Seminar. November 18, 2019.

Emergent Linguistic Structure in Neural NLP.
Amazon AI. July 25, 2019.

A Structural Probe for Finding Syntax in Word Representations.
NLP Highlights Podcast. May, 2019.

Abstracts

RNNs can generate bounded hierarchical languages with optimal memory.
John Hewitt, Michael Hahn, Surya Ganguli, Percy Liang, Christopher D. Manning
2020 Conference on the Mathematical Theory of Deep Learning (abstracts).

Semantic Bootstrapping in Frames: A Computational Model of Syntactic Category Acquisition.
John Hewitt, Jordan Kodner, Mitch Marcus, and Charles Yang.
Conference of the Cognitive Science Society (CogSci), (member posters) 2017. (pdf) (abstract)

Patents

Capturing Rich Response Relationships with Small-Data Neural Networks.
John Hewitt.
US Patent App 15/841,963. December 2017. (granted). (application)
Blog

Newest Articles

Truncation Sampling as Language Model Desmoothing

Initializing New Word Embeddings for Pretrained Language Models

Conditional Probing and Usable Information

The Unreasonable Syntactic Expressivity of RNNs

Designing and Interpreting Probes

See all articles
Projects
- Self-Attention and Transformers lecture notes
  - I wrote a lecture on Transformers in my role as Head TA for Stanford’s CS 224N: Natural Language Processing with Deep Learning in 2021. The updated slides are available, as is a recording on YouTube. In 2023, I updated the lecture (which had also been updated by Anna Goldie in 2022). Along with the lecture, in 2023 I wrote brand new lecture notes.
- Pretraining lecture
  - I wrote a lecture on Pretraining for the same course! The 2021 version is available on YouTube.
- Model analysis and explanation lecture
  - I wrote a lecture on analysis and explanation of NLP models for the same course! The 2021 version is available on YouTube
About

Tidbits

This talk by Rajiv Gandhi, to whom I am grateful. For you if you think, like I used to, that research–or any success in STEM–is out of your reach.

Scott Aaronson’s old note on frameworks for reasoning about large numbers, for enjoyment

Kevin Knight’s note on unix commands, to help you with your bash skills

The Fundamental Whiteboard Difficulty (Scott Aaronson):

I figured that chalk has its problems—it breaks, the dust gets all over—but I could live with them, much more than I could live with the Fundamental Whiteboard Difficulty, of all the available markers always being dry whenever you want to explain anything.

I highly suggest Arch Linux for its configurability and the educational experience it provides…

Contact

Take my school email johnhew@stanford, and predict the TLD using your internal knowledge base.

A bit of history

I was absolutely destroyed by my first year of computer science undergraduate studies; bad grades, late nights, a good amount of crying. My academic advisor Professor Max Mintz asked me what I was doing at Penn, as I was certainly no good at getting good grades. He cared deeply about his students. Out of possible futures, I didn’t love software engineering, I wasn’t good enough at math to be a quant, and though I was interested in law, my GPA was already too low to get into a good law school (even if I got straight As for the rest of college.)

I tried for a while to explore research. Professor Ani Nenkova was generous enough to spend time mentoring me briefly as a freshman on a project that I, to my discredit, brought nowhere, and eventually dropped. Still, her generosity, and that of Professor Rajiv Gandhi, who believed in me even as I was barely passing his first-year algorithms and discrete mathematics courses, led them to write me letters of recommendation for the Johns Hopkins University Summer Research Experiences program. I am grateful to both of them.

Not expecting to get into the Hopkins program, I applied to the startup of a Penn alum associated with STWing, an amazing nerdy community; I was surprised to get an offer, and a chance to spend the summer in Palo Alto, which I’d recently heard was a hotspot of the technology world. Well, as luck would have it, I got into the Hopkins program, but had already accepted the startup role, so I politely declined Hopkins and prepped for the summer. Just a bit before the summer was to start, I got a call from the startup founder saying they’d run out of runway! No startup, no internship.

At Max Mintz’ recommendation, I called the cell phone of the person I’d have worked with at JHU—Professor David Yarowsky—who for whatever reason had his cell number on his website! To my continual amazement, David picked up, and I sheepishly asked if he still had room for me. He did, and I spent the summer studying inflectional morphology for low-resource machine translation. I loved it, and unlike my first year of undergrad, I wasn’t bad at it.

Later, Dr. Matt Post agreed to mentor me on running machine translation experiments, and spent an enormous amount of time teaching me as I tried to write my first research paper. The connection with David and Matt led me to Professor Chris Callison-Burch, who kindly allowed me a space in his lab. He mentored me for the rest of my undergraduate research, where I got to work with Professor Charles Yang as well.

By the end of my undergraduate career, I’d mostly caught up with my peers in terms of performance in coursework, but I’d never been the “exceptional” undergrad in courses that some would suggest would be good at research. It was only through the generosity and time of a large number of people that I’d had the chance to do research and thrive.

John Hewitt

Publications

2024

2023

2022

2021

2020

2019

2018

2017

2016

Invited Talks

Abstracts

Patents

Newest Articles

Tidbits

Contact

A bit of history

Join My Newsletter