CAREER: Data Preparation for Trusted and Fair Data Science

Sponsor: National Science Foundation

This material is based upon work supported by the National Science Foundation under Grant No: IIS-2237149

Principal Investigator: Romila Pradhan

Project Summary: Machine learning systems are increasingly being used in a variety of data science applications that involve automated decision-making. While learning-enabled systems have the potential to eliminate some undesirable aspects of human decision-making, they are known to reinforce systemic biases and discrimination reflected in their training data. This project will develop novel technologies to realize the potential of robust, fair, and explainable data-driven decision-making systems. Toward this goal, the project centers on data preparation and debugging techniques to ensure that the underlying training data and data handling processes are devoid of unexpected errors. Comprehensive and efficient solutions that demonstrate the importance of data quality as a tool for understanding and debugging undesired behavior of data science applications will be developed.

Project Publications:

  • Explaining Fairness Violations using Machine Unlearning
    Tanmay Surve, and Romila Pradhan. In Proceedings of the 28th International Conference on Extending Database Technology (EDBT), Barcelona, Spain, 2025.
  • Valuation-based Data Acquisition for Machine Learning Fairness
    Ekta, and Romila Pradhan. In Proceedings of the 13th International Workshop on Quality in Databases (QDB) at the 50th VLDB Conference, China, 2024.

Disclaimer: Any opinions, findings, and conclusions or recomendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF).