In this talk, I will present our work on Visual Question Answering (VQA) -- I will provide a brief overview of the VQA task, dataset and baseline models, highlight some of the problems with existing VQA models, and talk about our works on fixing some of these problems by proposing -- 1) a new evaluation protocol, 2) a new model architecture, and 3) a novel objective function. Towards the end of the talk, I will also present some very recent work towards building agents that can generate diverse programs for scenes when conditioned on instructions and trained using reinforced adversarial learning.
Aishwarya Agrawal is a fifth year Ph.D. student in the School of Interactive Computing at Georgia Tech, working with Dhruv Batra and Devi Parikh. Her research interests lie at the intersection of computer vision, machine learning and natural language processing. The Visual Question Answering (VQA) work by Aishwarya and her colleagues has witnessed tremendous interest in a short period of time (3 years). Aishwarya is a recipient of the NVIDIA Graduate Fellowship 2018-2019, she is one of the Rising Stars in EECS 2018, a finalist of the Foley Scholars Award 2018 and Microsoft and Adobe Research Fellowships 2017-2018. As a research intern Aishwarya has spent time at Google DeepMind, Microsoft Research and Allen Institute for Artificial Intelligence. Aishwarya received her bachelor's degree in Electrical Engineering with a minor in Computer Science and Engineering from Indian Institute of Technology (IIT) Gandhinagar in 2014.