How much information do the outputs of NLP models contain about their inputs? We investigate the problem in two scenarios, recovering text inputs from the outputs of embeddings from sentence embedders and next-token probability outputs from language models. In both cases, our methods are able to fully recover some inputs given just the model output.
Jack Morris is a fourth-year CS PhD candidate at Cornell University and a visiting researcher with FAIR at Meta. His research spans the intersection of natural language processing, machine learning, and security, with a recent focus on applications of information retrieval systems. His research is supported by an NSF Graduate Research Fellowship.