I am an ELLIS PhD Student advised by Prof. Bernt Schiele (Max Planck Institute of Informatics, Saarbrücken) and Prof. Francesco Locatello (ISTA). During the course of my PhD I shall be associated with both the Max Planck Institute for Informatics and Institute of Science and Technology, Austria.

Previously, I was a Pre-doctoral Research Fellow at Microsoft Research (MSR), India in the Technology for Emerging Markets group, where I worked on applications of Computer Vision, Image Processing and Machine Learning to developing low-cost diagnostic solutions for healthcare. Before that, I worked as a Research Intern at Adobe Inc. in the Media and Data Science Research Group.

I completed my Master's and B.Tech with Honours in May, 2020 at IIIT Hyderabad, where I was advised by Prof. PJ Narayanan. My work was mainly on learning robust unsupervised style representations for image regognition and retrieval tasks. I train Deep Neural Networks to solve complex tasks in the areas of Computer Vision, Image Processing and Natural Language Processing. The goal of my work is to build powerful intelligent systems that can help us understand human intelligence better and aid us in our day to day lives.


Current Research: I am broadly interested in Artificial Intelligence (Machine Learning), Computer Vision, Image Processing, Natural Language Processing and their applications to real-world problems. I am particularly interested in building reliable (robust) systems that model visual perception with limited supervision. To this end, for my Phd I will be exploring two major directions:

  1. Interpretability and Robustness of Deep Neural Networks
  2. Learning Powerful (unsupervised) Object-Centric Representations

Published Research: My prior research has involved using machine learning, image processing, and optical physics to tackle key challenges in computer vision and healthcare, including (1) image compositing, (2) self-supervised representation learning, (3) few-shot image segmentation, (4) anomaly detection, and (5) AI-based medical diagnosis. These have resulted in publications at top conferences like IJCAI, WACV, SIGIR, EMBC, IMWUT/Ubicomp and MMM.

When not working on my research, I like to play the piano and guitar, listen to music, read non-fiction, drive motorcycles, and go for a run or hike. I also am really fascinated by paradoxes, can find some here. (I wish I had Hermione's Time-Turner to do much more in a day as much as I'd like to.) (see more at Personal)

Relevant publications are mentioned below:

Computer Vision, Image Processing and Machine Learning
SimPropNet: Improved Similarity Propagation for Few Shot Segmentation
Siddhartha Gairola, Ayush Chopra, Mayur Hemani and Balalji K.
International Joint Conferences on Artificial Intelligence (IJCAI), 2020
IJCAI_Proceedings / pdf / bibtex

Improving similarity propagation to improve one-shot and few-shot image segmentaiton.

Unsupervised Image Style Embeddings for Retrieval and Recognition Tasks
Siddhartha Gairola, Rajvi Shah, P.J. Narayanan
IEEE Winter Conference on Applications of Computer Vision (WACV '20), 2020
code / project_page / paper / supp_material / bibtex

An unsupervised protocol for learning a neural embedding of visual style of images.

The proposed protocol does not leverage categorical labels but a proxy measure for finding stylistically similiar and dissimilar images.

Find Me a Sky : A Data-driven Method for Color-Consistent Sky Search and Replacement
Saumya Rawat*, Siddhartha Gairola*, Rajvi Shah, P.J. Narayanan
International Conference on Multimedia Modelling (MMM '18), 2018
project_page / pdf / bibtex

A data driven method for color-consistent sky search and replacement.

This technology does not require the use of complex color transfer techniques.

*Both authors contributed equally towards this work.

Applied ML, Vision, HCI for Healthcare
Towards Automating Retinoscopy for Refractive Error Diagnosis
Aditya Aggarwal, Siddhartha Gairola, Nipun Kwatra and Mohit Jain et al.
Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT), Volume 6, Issue 3, 2022.
code / project_page / pdf

In this work, we automate retinoscopy by attaching a smartphone to a retinoscope and recording retinoscopic videos with the patient wearing a custom pair of paper frames. We develop a video processing pipeline that takes retinoscopic videos as input and estimates the net refractive error based on our proposed extension of the retinoscopy mathematical model. Our system alleviates the need for a lens kit and can be performed by an untrained examiner. In a clinical trial with 185 eyes, we achieved a sensitivity of 91.0% and specificity of 74.0% on refractive error diagnosis. Moreover, the mean absolute error of our approach was 0.75+/-0.67D on net refractive error estimation compared to subjective refraction measurements. Our results indicate that our approach has the potential to be used as a retinoscopy-based refractive error screening tool in real-world medical settings.

Keratoconus Classifier for Smartphone-based Corneal Topographer
Siddhartha Gairola, Nipun Kwatra and Mohit Jain et al.
IEEE Engineering in Medicine & Biology Society (EMBC), 2022.
arXiv / pdf

In this work, we propose a dual-head convolutional neural network (CNN) for classifying keratoconus on the heatmaps generated by SmartKC. Since SmartKC is a new device and only had a small dataset (114 samples), we developed a 2-stage transfer learning strategy -- using historical data collected from a medical-grade topographer and a subset of SmartKC data -- to satisfactorily train our network. This, combined with our domain-specific data augmentations, achieved a sensitivity of 91.3% and a specificity of 94.2%.

Smartphone based Corneal Topographer
Siddhartha Gairola, Nipun Kwatra and Mohit Jain
Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT), Volume 5, Issue 4, 2021.
code / project_page / pdf / demo_video

SmartKC is a low-cost smartphone-based corneal topographer. It provides a low-cost solution for the mass screening of keratoconus at scale.

RespireNet: A Deep Neural Network for Accurately Detecting Abnormal Lung Sounds in Limited Data Setting
Siddhartha Gairola, Francis Tom, Nipun Kwatra and Mohit Jain
IEEE Engineering in Medicine & Biology Society (EMBC), 2021.
code / arXiv / pdf / bibtex

RespireNet is a simple CNN-based model, along with a suite of novel techniques—device specific fine-tuning, concatenation-based augmentation, blank region clipping, and smart padding— enabling one to efficiently use small-sized datasets. We perform extensive evaluation on the ICBHI dataset, and improve upon the state-of-the-art results for 4-class classification by 2.2%

Multimodal Machine Learning: Vision and NLP
Identifying Clickbait: A Multi-Strategy Approach Using Neural Networks
Dhruv Khattar, Siddhartha Gairola, Vaibhav Kumar, Yash Kumar Lal, Vasudeva Varma
code / arxiv / bibtex

Detecting clickbaits using Deep Learning.

Master's Thesis

For my Master's research I was supervised by Prof. P. J. Narayanan at CVIT at IIIT Hyderabad. My research was on the following two tasks (1) representation learninng for image style search and retrieval, and (2) color consistent background replacement.

Image Representations for Style Retrieval, Recognition and Background Replacement Tasks
Siddhartha Gairola, Master's Thesis, IIIT Hyderabad 2020.
abstract / pdf
Work Experience
msft_logo Microsoft Research Research Fellow (Aug, 2020 - Aug, 2022)
msft_logo Microsoft Research Research Intern (Jan, 2020 - July, 2020)
msft_logo Adobe Inc. Research Intern (Jun, 2019 - Jan, 2020)
Talks and Presentations

RespireNet: A DNN for Accurately Detecting Abnormal Lung Sounds in Limited Data Setting, EMBC 2021

SimPropNet: Improved Similarity Propagation for Few-shot Image Segmentation, IJCAI 2020

Unsupervised Image Style Embeddings for Retrieval and Recognition Tasks, IEEE WACV 2020

Useful Resources and Writings
  1. I maintain some writings, resources and FAQs on Graduate School (PhD) application process here that I update sporadically (click here).
  2. I do blog sometimes on medium (click here) about research, general thoughts and some personal things.
Open-Source Contribution

I contribute actively to open source organizations - Scilab and LibreOffice. I have been working with Scilab for the past 3 years now.
My proposals were selected twice (2017, 2018) as a project for GSoC (Google Summer of Code) Program.

Google Summer of Code 2017
Project Details : Implemented a C/C++ wrapper for Matlab MEX-API on current API Scilab.

Google Summer of Code 2018
Project Details : Implemented a DEMO in C/C++ and Scilab as a working example for the MEX Library in Scilab.

Teaching Experience

Worked as a Teaching Assistant at IIIT Hyderabad for the courses listed below.

The duties involved taking regular tutorials, paper corrections, setting up questions for assignments and conducting evaluations.

1. Digital Logic and Processors (Monsoon 2016)
2. Artificial Intelligence(Spring 2017)
3. Digital Image Processing (Monsoon 2017)
4. Computer Vision (Spring 2018)
5. Digital Image Processing (Monsoon 2018)
6. Computer Graphics (Spring 2019)


International Institute of Information Technology - Hyderabad
Master of Science (MS) by Research, Computer Science (2018-2020)

International Institute of Information Technology - Hyderabad
Bachelor of Technology (BTech) with Honours, Computer Science (2014-2018)

St. Joseph's Academy, Dehradun
Senior Secondary, ISC (2012-2013)

St. Joseph's Academy, Dehradun
Secondary, ICSE (2010-2011)

Website Credits to Mr. Jon Barron source code
Link to his website : Go