Responsible Data Science Lab at Purdue

We study problems at the intersection of data management and machine learning to build trustworthy and responsible decision-making systems. Our aim is to develop systems that enable explainability, fairness, and accountability of data-driven decision-making systems. We are particularly interested in:

  • Explaining and debugging fairness violations in machine learning models and data science pipelines:
    • How can we determine sources of unexpected errors and bias in machine learning model outcomes?
    • How can we decompose unexpected or discriminatory behavior of data science pipelines in terms of the different pipeline stages?
    • Can we effectively generate post hoc explanations for the outcomes of machine learning models?
  • Data integration and data quality:
    • How can we leverage expert feedback to improve data cleaning techniques for machine learning?
    • Can we use the final outcomes in data science pipelines to inform intermediate pipeline choices?
    • How can we intertwine pipeline stages with downstream analytics to improve upon the end goals?

We are always looking for motivated Ph.D. students to collaborate with. If you are interested in data management and/or responsible data analytics, feel free to contact us with your CV/resume and a couple of sentences describing your research interests, and consider applying to Purdue CIT!

news

Nov 20, 2023 Tanmay defends his M.S. thesis. Congrats, Tanmay!
Nov 16, 2023 Dr. Pradhan gave an invited talk at Brandeis University Computer Science seminar.
Aug 11, 2023 The group welcomes Ambarish and Jahid as our newest Ph.D. students.
Jun 29, 2023 Excited to receive an NSF CAREER Award.
Mar 23, 2023 Dr. Pradhan gave a talk on fairness debugging using Gopher at MIT CSAIL’s Causality reading group.

selected publications

  1. Explainable AI: Foundations, Applications, Opportunities for Data Management Research
    Romila Pradhan, Aditya Lahiri, Sainyam Galhotra, and Babak Salimi
    In Proceedings of the 2022 International Conference on Management of Data, 2022
  2. Interpretable Data-Based Explanations for Fairness Debugging
    Romila Pradhan, Jiongli Zhu, Boris Glavic, and Babak Salimi
    In Proceedings of the 2021 International Conference on Management of Data, 2022
  3. Explaining Black-Box Algorithms using Probabilistic Contrastive Counterfactuals
    Sainyam Galhotra, Romila Pradhan, and Babak Salimi
    In Proceedings of the 2021 International Conference on Management of Data, 2021
  4. Staging User Feedback toward Rapid Conflict Resolution in Data Fusion
    Romila Pradhan, Siarhei Bykau, and Sunil Prabhakar
    In Proceedings of the 2017 ACM International Conference on Management of Data, 2017