Research Interests
Natural Language Processing, Conversational Human-Machine Interaction, Artificial Intelligence
Academics
Stanford University, California, USA |
2008-2010 |
Master of Science
(MS) in Computer Science (Specialization: Artificial Intelligence) with Distinction in Research* in Natural Language Processing |
GPA: 3.91/4.00 |
*granted to 8 students in a graduating batch of 147 students
Masters Research Report: Simple Coreference Resolution with Rich Syntactic and Semantic Features: Is it Enough? [pdf]Selected Coursework: Machine Learning, Natural Language Processing, Information Retrieval and Web Search, Speech Recognition and Synthesis, Natural Language Understanding, Foundations of Cognition, Seminar in Lexical Semantics: Space and Motion, Seminar in Psycholinguistics: Information-Theoretic Models of Language and Cognition, Research Project in Artificial Intelligence
National Institute of Technology (NIT), Calicut, India |
2004-2008 |
Bachelor of Technology (B.Tech) in Computer Science and Engineering | GPA: 9.14/10.00 |
Selected Coursework: Computational
Intelligence, Network Security, E-Commerce, Web Programming, Advanced
Data Structures, Design and Analysis of Algorithms, Logic for Computer
Science, Number Theory and Cryptography, Computer Architecture,
Compiler Construction, Computer Networks, Database Management Systems,
Operating Systems, Principles of Programming Languages, Software
Engineering, Communication and Information Theory, Computational
Combinatorics and Graph Theory, Computer Organization, Computer
Hardware Design, Computer System Software, Discrete Computational
Structures, Theory of Computation, Data Structures and Algorithms,
Logic Design, Program Design, Introduction to Computing, Probability
and Statistics
Publications
Milad Shokouhi, Rosie Jones, Umut Ozertem, Karthik Raghunathan, Fernando Diaz. 2014. Mobile query reformulations. In Proceedings of the 37th international ACM SIGIR conference on Research &Development in Information Retrieval. [Abstract, Full Text, BibTex]
Karthik Raghunathan, Heeyoung Lee, Sudarshan Rangarajan, Nate Chambers, Mihai Surdeanu, Dan Jurafsky and Christopher Manning. 2010. A multi-pass sieve for coreference resolution. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. [Full Text, BibTeX]
Adam Vogel, Karthik Raghunathan and Dan Jurafsky. 2010. Eye Spy: Improving Vision through Dialog. In Proceedings of the 2010 AAAI Fall Symposium Series. [Abstract, Full Text, BibTeX]
Work Experience
Microsoft Corporation, Sunnyvale, CA |
August 2012 onwards |
Senior Applied Scientist |
- Working as a Scientist in the Bing Speech, Language and Intent group
- Improving spoken language interaction in Windows, Windows Phone and Xbox (Kinect)
Microsoft Corporation, Mountain View, CA |
March 2012 - August 2012 |
Research Software Development Engineer II |
- Worked as an RSDE in Ron Kaplan's Natural Language Platform team in Bing search
- Leveraged the Powerset/PARC finite state technology and language tools for linguistic analysis and shallow semantic understanding of web queries
Microsoft Corporation, Redmond, WA |
July 2010 - March 2012 |
Software Development Engineer |
- Worked as an SDE in the Natural Language Group (NLG) of Microsoft Office
- Developed writing assistance (proofing) tools for the various MS Office products
Stanford Natural Language Processing Group, Stanford, CA |
Sept 2009 - June 2010 |
Graduate Research Assistant |
- Advisors: Dan Jurafsky, Christopher Manning
- Conducted research on coreference resolution, culminating in a Masters research report and a publication at EMNLP 2010
- Continued work on the Stanford SMS Translator, turned it into a user-facing website, and presented the work in the PhD poster session of Stanford Computer Forum's Annual Affiliate Meeting in 2010
- Was the customer support admin for JavaNLP software
Microsoft Corporation, Redmond, WA |
June 2009 - Sept 2009 |
Software Development Engineer (SDE) Intern |
- Interned as an SDE with the Revenue & Relevance Team of Microsoft adCenter after the first year of my Masters program
- Worked on the adCenter Marketplace Scorecard project, aimed at developing a standard reliable set of metrics that measure the company's performance in the online advertising marketplace and aid in making informed decisions to maximize the marketplace value
- Initiated the efforts on a statistical learning model that effectively predicts changes in the advertisers' bidding behavior with time
- Got exposure to Microsoft's Cosmos distributed storage system and SCOPE scripting language.
Stanford Natural Language Processing Group, Stanford, CA |
Sept 2008 - June 2009 |
Graduate Research Assistant |
- Advisors: Christopher Manning, Dan Jurafsky
- Worked on Phrasal, Stanford's phrase-based statistical machine translation (SMT) system
- Led Stanford's efforts for the DARPA GALE Phase 3 Chinese-English MT evaluation as part of the IBM-Rosetta team
Microsoft Research (MSR), Bangalore, India |
April 2007 - July 2007 |
Research Intern |
- Interned with the Multilingual Systems group of MSR India after the third year of my Bachelors program
- Investigated the tolerance of statistical machine translation systems to noise in the training corpus, particularly the kind of noise that accompanies automatic extraction of parallel corpora from comparable corpora
- Worked on the design of an online game for NLP data acquisition
Skills
Natural Languages
English, Hindi, Gujarati, Tamil
Programming Languages
Java, C#, Perl, C++, C, SQL, MATLAB
Speech / NLP / AI Tools:
JavaNLP, Berkeley Aligner, Giza++ Word Alignment Tool, Moses Statistical Machine Translation Toolkit, Robot Operating System (ROS), HMM Toolkit (HTK), CMU Sphinx Automatic Speech Recognition System, Festival Speech Synthesis System, VoiceXML
Other tools
Eclipse, Microsoft Visual Studio, Microsoft SQL Server Management Studio, Vim, GIT, SVN, LaTeX, LEX, YACC
Musical Instruments
Keyboard
Projects
Coreference Resolution |
Sept 2009 - June 2010 |
During my second year at Stanford, I conducted research on the problem of coreference resolution, which became the main topic for my Masters research. We started by attempting to replicate Haghighi and Klein's work from EMNLP 2009, and generalize their algorithm to remove the necessity of knowing the test set in advance for directing the semantic bootstrapping algorithm (see research report for more details).
However, from our experiments, we found Haghighi et al. (EMNLP 2009)'s bootstrapping approach to be too unpredicatable in a generic setting and hence explored other ways to improve the system in the absence of a semantic module, while still keeping the overall system deterministic. Using insights gained from error analysis, we implemented a cautious multi-sieve system which globally shared information across entities. Our system, despite being a simple deterministic one outperformed many state-of-the-art supervised and unsupervised models on several standard corpora. We presented our work at the 2010 Conference on Empirical Methods in Natural Language Processing at MIT, Massachusetts.
Our coreference resolution system was implemented using the JavaNLP framework and is available for download as part of the Stanford CoreNLP suite of tools.
[Report, Paper, Talk Slides, Software]Improving Vision through Dialog |
Apr 2010 - June 2010 |
The research goal behind this project was to enable robots to identify novel objects in new environments, i.e. objects, that their vision systems had not already been trained to recognize. However, instead of the conventional computer vision method of manually collecting new training data for the object of interest, we envisioned having a robotic dialog system that learns names and attributes of new objects on the fly through spoken interaction (modeled on lines of the children's games "I Spy" and "20 Questions") with a human tutor. We presented our work at the 2010 AAAI Fall Symposium Series at Arlington, Virginia.
[Paper, Demo Video]Bits & Bots |
Jan 2010 - Mar 2010 |
As part of our graduate course on "Designing casual learning games for the iPhone (EDUC 396X)", we developed a shooter game for the iPhone that taught school kids about logic gates and boolean logic. The game involved using bits (0's and 1's) as ammunition to equip a gun that functioned as a logic gate and taking down rogue robots by producing the appropriate output from the gun. Difficulty across levels was varied using factors like the number of different guns being available for use, speed of the marching robots, etc.
[Slides]SMS Text Normalization |
Apr 2009 - June 2009 |
We developed a system for converting textspeak (language used in SMS communication) to proper English using the Moses statistical machine translation system, as part of the graduate course on Natural Language Processing. We later presented this work in the PhD Poster Session of Stanford Computer Forum's annual affiliates meeting in April 2010.
[Report, Poster]A Situated, Embodied Spoken Language System for Household Robotics |
Jan 2009 - Apr 2009 |
As a combined class project for the the graduate courses on Speech Recognition and Synthesis and Research Project in AI, we developed a spoken dialog interface to the Stanford AI Robot (STAIR) for giving instructions for simple fetching tasks. Our code is now a part of the SAIL ROS package.
[Report]TagEz: Flickr Tag Recommendation |
Sep 2008 - Dec 2008 |
We built an automatic tag prediction system for images on Flickr.com using machine learning on both linguistic and vision features, as part of the graduate course on Machine Learning.
[Report]Automated Receptionist for the Computer Science Department at NIT Calicut |
Jul 2007 - Mar 2008 |
Adapted prior work done at IIIT Hyderabad to serve as an automated receptionist for the NITC CSED Faculty Directory. The system was implemented for Tamil and English languages.
Rapid Protoyping of Spoken Dialog Systems for Indian Languages |
Apr 2006 - Jun 2006 |
As part of my summer internship at the Speech Lab (part of the Language Technologies Research Center) in IIIT Hyderabad, I contributed to the research on methods to rapidly build restricted domain spoken dialogue systems for various Indian languages. Using open source speech tools (Sphinx II Decoder, CMU Statistical LM Toolkit, Festival), we developed a spoken dialog system that functioned as an automated receptionist and handled queries related to the IIIT Faculty Directory. The system was implemented for Telugu, Tamil and English languages.
[Report]Other Activities
Masters Admissions Committee Member, Stanford Computer Science |
Feb 2010 - Mar 2010 |
Was a part of the Masters admissions committee that screened applications for the MS in Computer Science program (Fall 2011) at Stanford University.
Masters Student Advisor, Stanford Computer Science |
Fall 2009 |
Was one of the student advisors selected for helping out and advising the newly admitted students in the Stanford Computer Science Department's Masters program.
Customer Support Admin, Stanford NLP Group |
Sept 2009 - Jun 2010 |
In charge of providing customer support for the Stanford Natural Language Processing group’s software distributions (nlp.stanford.edu/software/).
Secretary, Computer Science and Engineering Association (CSEA) at NITC |
Jul 2007 - Apr 2008 |
Headed the CSEA (the official association of the Department of Computer Science and Engineering), which plans and organizes all computer science related activities at NIT Calicut.
Participant, MSR-IISc Summer School on NLP |
May 2007 |
Attended the summer school on Natural Language Processing at the Indian Institute of Science (IISc), Bangalore, conducted by Microsoft Research (MSR) India in collaboration with IISc and Department of Science and Technology, Government of India.
Speaker, Seminar-cum-workshop on Spoken Dialog Systems at NITC |
Oct 2006 |
Conducted a seminar-cum-hands-on workshop during Tathva '06 (the annual all-India technical festival of NIT Calicut), guiding the participants to implement restricted domain spoken dialog systems using open source speech tools.
Member of Organizing Committee: FOSS, Tathva and Ragam at NITC |
2005 - 2008 |
Have been a member of the organizing committee in various editions of NIT Calicut’s Free Open Source Software (FOSS) Meet, Tathva (all-India technical festival) and Ragam (all-India cultural festival).
Awards and Achievements
- Was granted "Distinction in Research" by the Department of Computer Science at Stanford University on completion of my MS program. The award is granted to students conducting significant research under an adviser as part of their Masters program, culminating in a research report. The award was given to eight out of the total 147 degree recipients for 2009-2010.
- Scored 1560/1600 (Verbal: 760/800, Quantitative: 800/800, Analytical Writing: 5.0/6.0) in the GRE General Test (2007).
- Won academic proficiency awards at NIT Calicut for topping the department in 5th and 6th semesters of B.Tech in 2006 and 2007 respectively.
- Won a certificate of merit from the Council of Scientific and Industrial Research (CSIR), India for featuring among the top 20 in the state in the AISSE exam, participated in the CSIR Program on Youth for Leadership in Science (2002)
- Won the National Science Day Quiz Contest conducted by the Physical Research Lab at Ahmedabad, India in 2002.
- Was a national finalist at the Indian National Cartographic Association’s annual Geography and Map quiz in 2002.
- Completed a year long course on playing the keyboard and performed at a concert in 2001.