Responsible Data Science Lab at Purdue

We study problems at the intersection of data management and machine learning to build trustworthy and responsible decision-making systems. Our aim is to develop systems that enable explainability, fairness, and accountability of data-driven decision-making systems. We are particularly interested in:

  • Explaining and debugging fairness violations in machine learning models and data science pipelines:
    • How can we determine sources of unexpected errors and bias in machine learning model outcomes?
    • How can we decompose unexpected or discriminatory behavior of data science pipelines in terms of the different pipeline stages?
    • Can we effectively generate post hoc explanations for the outcomes of machine learning models?
  • Data integration and data quality:
    • How can we leverage expert feedback to improve data cleaning techniques for machine learning?
    • Can we use the final outcomes in data science pipelines to inform intermediate pipeline choices?
    • How can we intertwine pipeline stages with downstream analytics to improve upon the end goals?

We are always looking for motivated Ph.D. students to collaborate with. If you are interested in data management and/or responsible data analytics, feel free to contact us with your CV/resume and a couple of sentences describing your research interests, and consider applying to Purdue CIT!

Sponsors We are thankful for the generous funding award and gift from our sponsors: NSF, Google, and CASMI.

news

Aug 12, 2024 Welcoming Omkar and Ananya to the group!
Jul 22, 2024 Ekta’s paper on Valuation-based Data Acquisition for Machine Learning Fairness accepted to the 13th International Workshop on Quality in Databases (QDB) at the 50th VLDB conference. Congrats, Ekta!
Apr 15, 2024 Ekta defends her M.S. thesis. Congrats, Ekta!
Apr 2, 2024 Dr. Pradhan gave a talk on debugging and explaining unfairness in machine learning models at the CERIAS 2024 25th Annual Cybersecurity Symposium.
Nov 20, 2023 Tanmay defends his M.S. thesis. Congrats, Tanmay!

selected publications

  1. Explainable AI: Foundations, Applications, Opportunities for Data Management Research
    Romila Pradhan, Aditya Lahiri, Sainyam Galhotra, and Babak Salimi
    In Proceedings of the 2022 International Conference on Management of Data, 2022
  2. Interpretable Data-Based Explanations for Fairness Debugging
    Romila Pradhan, Jiongli Zhu, Boris Glavic, and Babak Salimi
    In Proceedings of the 2021 International Conference on Management of Data, 2022
  3. Explaining Black-Box Algorithms using Probabilistic Contrastive Counterfactuals
    Sainyam Galhotra, Romila Pradhan, and Babak Salimi
    In Proceedings of the 2021 International Conference on Management of Data, 2021
  4. Staging User Feedback toward Rapid Conflict Resolution in Data Fusion
    Romila Pradhan, Siarhei Bykau, and Sunil Prabhakar
    In Proceedings of the 2017 ACM International Conference on Management of Data, 2017