Siddhartha Gairola

I am an ELLIS PhD Student advised by Prof. Bernt Schiele (Max Planck Institute of Informatics, Saarbrücken) and Prof. Francesco Locatello (ISTA). During the course of my PhD I shall be associated with both the Max Planck Institute for Informatics and Institute of Science and Technology, Austria.

Previously, I was a Research Fellow at Microsoft Research (MSR), India in the Technology for Emerging Markets group, where I worked on applications of Computer Vision, Image Processing and Machine Learning to developing low-cost diagnostic solutions for healthcare. Before that, I worked as a Research Intern at Adobe Inc. in the Media and Data Science Research Group on image understanding tasks.

I completed my Master's and B.Tech with Honours in May, 2020 at IIIT Hyderabad, where I was advised by Prof. PJ Narayanan. My work was mainly on learning robust unsupervised style representations for image regognition and retrieval tasks. I train Deep Neural Networks to solve complex tasks in the areas of Computer Vision, Image Processing and Natural Language Processing. The goal of my work is to build powerful intelligent systems that can help us understand human intelligence better and aid us in our day to day lives.

Research

Current Research: I am broadly interested in Artificial Intelligence (Machine Learning), Computer Vision, Image Processing, Natural Language Processing and their applications to real-world problems. I am particularly interested in building reliable (robust) systems that model visual perception with limited supervision. To this end, for my Phd I will be exploring two major directions:

Interpretability and Robustness of Deep Neural Networks
Learning Powerful (unsupervised) Object-Centric Representations

Published Research: My prior research has involved using machine learning, image processing, and optical physics to tackle key challenges in computer vision and healthcare, including (1) image compositing, (2) self-supervised representation learning, (3) few-shot image segmentation, (4) explainable AI (xAI) (5) anomaly detection, and (6) AI-based medical diagnosis. These have resulted in publications at top conferences like ICLR, IJCAI, WACV, SIGIR, EMBC, IMWUT/Ubicomp and MMM.

When not working on my research, I like to play the piano and guitar, listen to music, read non-fiction, drive motorcycles, and go for a run or hike. I also am really fascinated by paradoxes, can find some here. (I wish I had Hermione's Time-Turner to do much more in a day as much as I'd like to.) (see more at Personal)

Relevant publications are mentioned below:

	Computer Vision, Image Processing and Machine Learning
	How to Probe: Simple Yet Effective Techniques for Improving Post-hoc Explanations Siddhartha Gairola, Moritz Boehle, Fracesco Locatello and Bernt Schiele International Conference on Learning Representations (ICLR), 2025 OpenReview / arxiv / code / bibtex Post-hoc importance attribution methods are a popular tool for “explaining” Deep Neural Networks (DNNs) and are inherently based on the assumption that the explanations can be applied independently of how the models were trained. Contrarily, in this work we bring forward empirical evidence that challenges this very notion. Surprisingly, we discover a strong dependency on and demonstrate that the training details of a pre-trained model’s classification layer (<10% of model parameters) play a crucial role, much more than the pre-training scheme itself. With this finding we also present simple yet effective adjustments to the classification layers, that can significantly enhance the quality of model explanations. We validate our findings across several visual pre-training frameworks (fully-supervised, self-supervised, contrastive vision-language training) and analyse how they impact explanations for a wide range of attribution methods on a diverse set of evaluation metrics.
	SimPropNet: Improved Similarity Propagation for Few Shot Segmentation Siddhartha Gairola, Ayush Chopra, Mayur Hemani and Balalji K. International Joint Conferences on Artificial Intelligence (IJCAI), 2020 IJCAI_Proceedings / pdf / bibtex Improving similarity propagation to improve one-shot and few-shot image segmentaiton.
	Unsupervised Image Style Embeddings for Retrieval and Recognition Tasks Siddhartha Gairola, Rajvi Shah, P.J. Narayanan IEEE Winter Conference on Applications of Computer Vision (WACV '20), 2020 code / project_page / paper / supp_material / bibtex An unsupervised protocol for learning a neural embedding of visual style of images. The proposed protocol does not leverage categorical labels but a proxy measure for finding stylistically similiar and dissimilar images.
	Find Me a Sky : A Data-driven Method for Color-Consistent Sky Search and Replacement Saumya Rawat, Siddhartha Gairola, Rajvi Shah, P.J. Narayanan International Conference on Multimedia Modelling (MMM '18), 2018 project_page / pdf / bibtex A data driven method for color-consistent sky search and replacement. This technology does not require the use of complex color transfer techniques. *Both authors contributed equally towards this work.
	Applied ML, Vision, HCI for Healthcare
	SmartKC++: Improving Performance of Smartphone-Based Corneal Topographers Vaibhav Ganatra, Siddhartha Gairola, Nipun Kwatra and Mohit Jain et al. IEEE Winter Conference on Applications of Computer Vision (WACV '25), 2025 Open Access / code / bibtex Improving the SmartKC image processing pipeline, making it more robust and accurate.
	Towards Automating Retinoscopy for Refractive Error Diagnosis Aditya Aggarwal, Siddhartha Gairola, Nipun Kwatra and Mohit Jain et al. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT), Volume 6, Issue 3, 2022. code / project_page / pdf In this work, we automate retinoscopy by attaching a smartphone to a retinoscope and recording retinoscopic videos with the patient wearing a custom pair of paper frames. We develop a video processing pipeline that takes retinoscopic videos as input and estimates the net refractive error based on our proposed extension of the retinoscopy mathematical model. Our system alleviates the need for a lens kit and can be performed by an untrained examiner. In a clinical trial with 185 eyes, we achieved a sensitivity of 91.0% and specificity of 74.0% on refractive error diagnosis. Moreover, the mean absolute error of our approach was 0.75+/-0.67D on net refractive error estimation compared to subjective refraction measurements. Our results indicate that our approach has the potential to be used as a retinoscopy-based refractive error screening tool in real-world medical settings.
	Keratoconus Classifier for Smartphone-based Corneal Topographer Siddhartha Gairola, Nipun Kwatra and Mohit Jain et al. IEEE Engineering in Medicine & Biology Society (EMBC), 2022. arXiv / pdf In this work, we propose a dual-head convolutional neural network (CNN) for classifying keratoconus on the heatmaps generated by SmartKC. Since SmartKC is a new device and only had a small dataset (114 samples), we developed a 2-stage transfer learning strategy -- using historical data collected from a medical-grade topographer and a subset of SmartKC data -- to satisfactorily train our network. This, combined with our domain-specific data augmentations, achieved a sensitivity of 91.3% and a specificity of 94.2%.
	Smartphone based Corneal Topographer Siddhartha Gairola, Nipun Kwatra and Mohit Jain Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT), Volume 5, Issue 4, 2021. code / project_page / pdf / demo_video SmartKC is a low-cost smartphone-based corneal topographer. It provides a low-cost solution for the mass screening of keratoconus at scale.
	RespireNet: A Deep Neural Network for Accurately Detecting Abnormal Lung Sounds in Limited Data Setting Siddhartha Gairola, Francis Tom, Nipun Kwatra and Mohit Jain IEEE Engineering in Medicine & Biology Society (EMBC), 2021. code / arXiv / pdf / bibtex RespireNet is a simple CNN-based model, along with a suite of novel techniques—device specific fine-tuning, concatenation-based augmentation, blank region clipping, and smart padding— enabling one to efficiently use small-sized datasets. We perform extensive evaluation on the ICBHI dataset, and improve upon the state-of-the-art results for 4-class classification by 2.2%
	Multimodal Machine Learning: Vision and NLP
	Identifying Clickbait: A Multi-Strategy Approach Using Neural Networks Dhruv Khattar, Siddhartha Gairola, Vaibhav Kumar, Yash Kumar Lal, Vasudeva Varma ACM SIGIR, 2018 code / arxiv / bibtex Detecting clickbaits using Deep Learning.

Master's Thesis

For my Master's research I was supervised by Prof. P. J. Narayanan at CVIT at IIIT Hyderabad. My research was on the following two tasks (1) representation learninng for image style search and retrieval, and (2) color consistent background replacement.

Image Representations for Style Retrieval, Recognition and Background Replacement Tasks
Siddhartha Gairola, Master's Thesis, IIIT Hyderabad 2020.
abstract / pdf

Work Experience

Microsoft Research	Research Fellow	(Aug, 2020 - Aug, 2022)
Microsoft Research	Research Intern	(Jan, 2020 - July, 2020)
Adobe Inc.	Research Intern	(Jun, 2019 - Jan, 2020)

Talks and Presentations

Intriguing Applications and Overlooked Pitfalls of XAI in Visual Models,GMUM Workshop, Jagiellonian University, October 2024.
slides

RespireNet: A DNN for Accurately Detecting Abnormal Lung Sounds in Limited Data Setting, EMBC 2021
video

SimPropNet: Improved Similarity Propagation for Few-shot Image Segmentation, IJCAI 2020
video

Unsupervised Image Style Embeddings for Retrieval and Recognition Tasks, IEEE WACV 2020
video

Useful Resources and Writings

I maintain some writings, resources and FAQs on Graduate School (PhD) application process here that I update sporadically (click here).
I do blog sometimes on medium (click here) about research, general thoughts and some personal things.

Open-Source Contribution

I contribute actively to open source organizations - Scilab and LibreOffice. I have been working with Scilab for the past 3 years now.
My proposals were selected twice (2017, 2018) as a project for GSoC (Google Summer of Code) Program.

Google Summer of Code 2017
Project Details : Implemented a C/C++ wrapper for Matlab MEX-API on current API Scilab.

Google Summer of Code 2018
Project Details : Implemented a DEMO in C/C++ and Scilab as a working example for the MEX Library in Scilab.

Teaching Experience

Worked as a Teaching Assistant at Saarland University for the courses listed below.

THe duties involved setting up questions for assignments, examinations and paper corrections.

1. Elements of Data Science and Artificial Intelligence (Winter 2023, 2024)

Worked as a Teaching Assistant at IIIT Hyderabad for the courses listed below.

The duties involved taking regular tutorials, paper corrections, setting up questions for assignments and conducting evaluations.

1. Digital Logic and Processors (Monsoon 2016)
2. Artificial Intelligence(Spring 2017)
3. Digital Image Processing (Monsoon 2017)
4. Computer Vision (Spring 2018)
5. Digital Image Processing (Monsoon 2018)
6. Computer Graphics (Spring 2019)

Education

Max Planck Institute for Informatics & Saarland University
Ph.D. Student, Computer Science (Sept. 2022 - present)

International Institute of Information Technology - Hyderabad
Master of Science (MS) by Research, Computer Science (2018-2020)

International Institute of Information Technology - Hyderabad
Bachelor of Technology (BTech) with Honours, Computer Science (2014-2018)

St. Joseph's Academy, Dehradun
Senior Secondary, ISC (2012-2013)

St. Joseph's Academy, Dehradun
Secondary, ICSE (2010-2011)

Website Credits to Mr. Jon Barron source code
Link to his website : Go