I’m broadly interested in Computer Vision, along with its intersection with NLP. Currently, I’m working on Video Action Recognition with Prof. Gaurav Sharma. Recently, I visited Prof. Raquel Urtasun and Prof. Sanja Fidler at the University of Toronto, where I worked on visual-semantic embeddings. Previously, I’d worked with Prof. Amitabha Mukerjee on Generative models for Robot Motion Planning.
Apart from my academic pursuits, I like to spend my time looking at the world through a camera, playing the guitar or developing any application that could help the community or just automate my work! Thanks to my parents, I also get to spend my time doing philanthropic work.
You can find my resume here.
B.Tech in Computer Science, 2017
For the full list, please take a look at my resume
Implemented an attention based seq2seq model and baselines for the recently released Visual Storytelling task. Plan to release the code(based on Google's Show and Tell model code) in the near future
Implemented a VQA model that employs attention on a semantic representation of the image before decoding for the answer
Modified Yoon Kim's CNN model for sentiment classification. Also tested it on a Hindi sentiment dataset
Explored algorithms for Vehicle Detection and Automatic Number Plate Recognition, including Object Proposal, Background Subtraction, Tracking and Feature Matching methods
Extended the Convolutional Chair Generation model by Dosovitskiy et al. for a 3-DOF CRS Robot. The deep model of motion is learnt only by looking at the robot with minimal priors regarding its internal parameters. The reconstruction is done given joint angles and viewpoint only, similar to the few parameters used in the original paper