publications | Responsible Data Science Lab at Purdue

2022

Explainable AI: Foundations, Applications, Opportunities for Data Management Research

Romila Pradhan, Aditya Lahiri, Sainyam Galhotra, and Babak Salimi

In 2022 IEEE 38th International Conference on Data Engineering (ICDE), 2022

Abs Website

Algorithmic decision-making systems are successfully being adopted in a wide range of domains for diverse tasks. While the potential benefits of algorithmic decision-making are many, the importance of trusting these systems has only recently attracted attention. There is growing concern that these systems are complex, opaque and non-intuitive, and hence are difficult to trust. There has been a recent resurgence of interest in explainable artificial intelligence (XAI) that aims to reduce the opacity of a model by explaining its behavior, its predictions or both, thus allowing humans to scrutinize and trust the model. A host of technical advances have been made and several explanation methods have been proposed in recent years that address the problem of model explainability and transparency. In this tutorial, we will present these novel explanation approaches, characterize their strengths and limitations, position existing work with respect to the database (DB) community, and enumerate opportunities for data management research in the context of XAI.
Explainable AI: Foundations, Applications, Opportunities for Data Management Research

Romila Pradhan, Aditya Lahiri, Sainyam Galhotra, and Babak Salimi

In Proceedings of the 2022 International Conference on Management of Data, 2022

Abs PDF Website

Algorithmic decision-making systems are successfully being adopted in a wide range of domains for diverse tasks. While the potential benefits of algorithmic decision-making are many, the importance of trusting these systems has only recently attracted attention. There is growing concern that these systems are complex, opaque and non-intuitive, and hence are difficult to trust. There has been a recent resurgence of interest in explainable artificial intelligence (XAI) that aims to reduce the opacity of a model by explaining its behavior, its predictions or both, thus allowing humans to scrutinize and trust the model. A host of technical advances have been made and several explanation methods have been proposed in recent years that address the problem of model explainability and transparency. In this tutorial, we will present these novel explanation approaches, characterize their strengths and limitations, position existing work with respect to the database (DB) community, and enumerate opportunities for data management research in the context of XAI.
Generating Interpretable Data-Based Explanations for Fairness Debugging Using Gopher

Jiongli Zhu, Romila Pradhan, Boris Glavic, and Babak Salimi

In Proceedings of the 2022 International Conference on Management of Data, 2022

Abs PDF

Machine learning (ML) models, while increasingly being used to make life-altering decisions, are known to reinforce systemic bias and discrimination. Consequently, practitioners and model developers need tools to facilitate debugging for bias in ML models. We introduce Gopher, a system that generates compact, interpretable and causal explanations for ML model bias. Gopher identifies the top-𝑘 coherent subsets of the training data that are root causes for model bias by quantifying the extent to which removing or updating a subset can resolve the bias. We describe the architecture of Gopher and will walk the audience through real-world use cases to highlight how Gopher generates explanations that enable data scientists to understand how subsets of the training data contribute to the bias of a machine learning (ML) model. Gopher is available as open-source software; The code and the demonstration video are available at https://gopher-sys.github.io/.
Interpretable Data-Based Explanations for Fairness Debugging

Romila Pradhan, Jiongli Zhu, Boris Glavic, and Babak Salimi

In Proceedings of the 2021 International Conference on Management of Data, 2022

Abs PDF Website

A wide variety of fairness metrics and eXplainable Artificial Intelligence (XAI) approaches have been proposed in the literature to identify bias in machine learning models that are used in critical real-life contexts. However, merely reporting on a model’s bias or generating explanations using existing XAI techniques is insufficient to locate and eventually mitigate sources of bias. We introduce Gopher, a system that produces compact, interpretable, and causal explanations for bias or unexpected model behavior by identifying coherent subsets of the training data that are root-causes for this behavior. Specifically, we introduce the concept ofcausal responsibility that quantifies the extent to which intervening on training data by removing or updating subsets of it can resolve the bias. Building on this concept, we develop an efficient approach for generating the top-𝑘 patterns that explain model bias by utilizing techniques from the machine learning (ML) community to approximate causal responsibility, and using pruning rules to manage the large search space for patterns. Our experimental evaluation demonstrates the effectiveness of Gopher in generating interpretable explanations for identifying and debugging sources of bias.
Human-in-the-Loop Bias Mitigation in Data Science

Romila Pradhan, and Tianyi Li

Human-Centered AI Workshop (HCAI) @ NeurIPS 22 (Vision), 2022

2021

Demonstration of Generating Explanations for Black-Box Algorithms Using Lewis

Paul Y. Wang, Sainyam Galhotra, Romila Pradhan, and Babak Salimi

Proceedings of the VLDB Endowment, Jul 2021

Abs PDF

Explainable artifcial intelligence (XAI) aims to reduce the opacity of AI-based decision-making systems, allowing humans to scrutinize and trust them. Unlike prior work that attributes the responsibility for an algorithm’s decisions to its inputs as a purely associational concept, we propose a principled causality-based approach for explaining black-box decision-making systems. We present the demonstration of Lewis, a system that generates explanations for black-box algorithms at the global, contextual, and local levels, and provides actionable recourse for individuals negatively afected by an algorithm’s decision. Lewis makes no assumptions about the internals of the algorithm except for the availability of its inputoutput data. The explanations generated by Lewis are based on probabilistic contrastive counterfactuals, a concept that can be traced back to philosophical, cognitive, and social foundations of theories on how humans generate and select explanations. We describe the system layout of Lewis wherein an end-user specifes the underlying causal model and Lewis generates explanations for particular use-cases, compares them with explanations generated by state-of-the-art approaches in XAI, and provides actionable recourse when applicable. Lewis has been developed as open-source software; the code and the demonstration video are available at lewis-system.github.io
Feature Attribution and Recourse via Probabilistic Contrastive Counterfactuals

Sainyam Galhotra, Romila Pradhan, and Babak Salimi

ICML Workshop on Algorithmic Recourse, Jul 2021

Abs PDF

There has been a recent resurgence of interest in explainable artificial intelligence (XAI) that aims to reduce the opaqueness of AI-based decisionmaking systems, allowing humans to scrutinize and trust them. Prior work has focused on two main approaches: (1) Attribution of responsibility for an algorithm’s decisions to its inputs, wherein responsibility is typically approached as a purely associational concept that can lead to misleading conclusions. (2) Generating counterfactual explanations and recourse, where these explanations are typically obtained by considering the smallest perturbation in an algorithm’s input that can lead to the algorithm’s desired outcome. However, these perturbations may not translate to realworld interventions. In this paper, we propose a principled and novel causality-based approach for explaining black-box decision-making systems that exploit probabilistic contrastive counterfactuals to provide a unifying framework to generate wide ranges of global, local and contextual explanations that provide insights into what causes an algorithm’s decisions, and generate actionable recourse translatable into real-world interventions.
Explaining Black-Box Algorithms using Probabilistic Contrastive Counterfactuals

Sainyam Galhotra, Romila Pradhan, and Babak Salimi

In Proceedings of the 2021 International Conference on Management of Data, Jul 2021

Abs PDF Website

There has been a recent resurgence of interest in explainable arti!cial intelligence (XAI) that aims to reduce the opaqueness of AI-based decision-making systems, allowing humans to scrutinize and trust them. Prior work in this context has focused on the attribution of responsibility for an algorithm’s decisions to its inputs wherein responsibility is typically approached as a purely associational concept. In this paper, we propose a principled causalitybased approach for explaining black-box decision-making systems that addresses limitations of existing methods in XAI. At the core of our framework lies probabilistic contrastive counterfactuals, a concept that can be traced back to philosophical, cognitive, and social foundations of theories on how humans generate and select explanations. We show how such counterfactuals can quantify the direct and indirect in!uences of a variable on decisions made by an algorithm, and provide actionable recourse for individuals negatively affected by the algorithm’s decision. Unlike prior work, our system, Lewis: (1) can compute provably effective explanations and recourse at local, global and contextual levels; (2) is designed to work with users with varying levels of background knowledge of the underlying causal model; and (3) makes no assumptions about the internals of an algorithmic system except for the availability of its input-output data. We empirically evaluate Lewis on four realworld datasets and show that it generates human-understandable explanations that improve upon state-of-the-art approaches in XAI, including the popular LIME and SHAP. Experiments on synthetic data further demonstrate the correctness of Lewis’s explanations and the scalability of its recourse algorithm.

2019

AuthIntegrate: Toward Combating False Data on the Internet

Romila Pradhan, and Sunil Prabhakar

SIGKDD Workshop on Truth Discovery and Fact Checking: Theory and Practice, Jul 2019

Abs PDF

The advent of the collaborative Web and the abundance of usergenerated data has resulted in the problem of information overload; it is becoming increasingly difficult to discern relevant information and discard false data. Recently, a number of solutions for automated fact-checking have been proposed that view the problem from a largely linguistic perspective. We observe that the problem of false data detection has roots in several extensively studied research areas in data management and data mining such as data integration, data cleaning, crowdsourcing and machine learning. Specifically, detection of false data has significant overlap with data fusion, an active area of research in data integration that focuses on distinguishing correct from incorrect information in a structured data setting. In this vision paper, we propose the architecture of AuthIntegrate, an end-to-end system that ingests conflicting data from disparate information providers, curates and presents highly accurate data to end-users. We discuss the technical challenges in building this system and outline an agenda for future research.

2018

Guided Data Fusion

Romila Pradhan

Ph.D. Thesis, Jul 2018

PDF
A Framework to Integrate User Feedback for Rapid Conflict Resolution

Romila Pradhan, Siarhei Bykau, and Sunil Prabhakar

In 2018 IEEE 34th International Conference on Data Engineering (ICDE), Jul 2018

Abs PDF

Data fusion addresses the problem of consolidating data from disparate information providers into a single unified interface. The different data sources often provide conflicting information for the same data item. Recently, several automated data fusion models have been proposed to resolve conflicts and identify correct data. Although quite effective, these data fusion models do not achieve a close-to-perfect accuracy. We present the demonstration of a system that leverages users as first-class citizens to confirm data conflicts and rapidly improve the effectiveness of fusion. This demonstration is built on solutions proposed in our previous work [1]. To utilize the user judiciously, our system presents claims in an order that is the most beneficial to effectiveness of fusion across data items. We describe ranking algorithms that are built on concepts from information theory and decision theory, and do not need access to ground truth. We describe the user input framework and demonstrate how conflict resolution can be expedited with minimal feedback from the user. We show that: (a) the framework can be easily adopted to existing data fusion models without any internal changes to the models, and (b) the framework can integrate both perfect and imperfect feedback from users.
Leveraging Data Relationships to Resolve Conflicts from Disparate Data Sources

Romila Pradhan, Walid G. Aref, and Sunil Prabhakar

In Database and Expert Systems Applications, Jul 2018

Abs PDF

Recently, a number of data fusion systems have been proposed that offer conflict resolution as a mechanism to integrate conflicting data from multiple information providers. State-of-the-art data fusion systems largely consider claims for a data item to be unrelated to each other. In many domains, however, the observed claims are often related to each other through various entity-relationships. We propose a formalism to express entity-relationships among claims of data items and design a framework to integrate the data relationships with existing data fusion models to improve the effectiveness of fusion. We conducted an experimental evaluation on real-world data, and show that the performance of fusion was significantly improved with the integration of data relationships by (a) generating meaningful correctness probabilities for claims of data items, and (b) ensuring that the multiple correct claims output by the fusion models were consistent with each other. Our approach outperforms state-of-the-art algorithms that consider the presence of relationships over claims of data items.

2017

Staging User Feedback toward Rapid Conflict Resolution in Data Fusion

Romila Pradhan, Siarhei Bykau, and Sunil Prabhakar

In Proceedings of the 2017 ACM International Conference on Management of Data, Jul 2017

Abs PDF

In domains such as the Web, sensor networks and social media, sources often provide conflicting information for the same data item. Several data fusion techniques have been proposed recently to resolve conflicts and identify correct data. The performance of these fusion systems, while quite accurate, is far from perfect. In this paper, we propose to leverage user feedback for validating data conflicts and rapidly improving the performance of fusion. To present the most beneficial data items for the user to validate, we take advantage of the level of consensus among sources, and the output of fusion to generate an effective ordering of items. We first evaluate data items individually, and then define a novel decision-theoretic framework based on the concept of value of perfect information (VPI) to order items by their ability to boost the performance of fusion. We further derive approximate formulae to scale up the decision-theoretic framework to large-scale data. We empirically evaluate our algorithms on three real-world datasets with different characteristics, and show that the accuracy of fusion can be significantly improved even while requesting feedback on a few data items. We also show that the performance of the proposed methods depends on the characteristics of data, and assess the trade-off between the amount of feedback acquired, and the effectiveness and efficiency of the methods.