Employing Hybrid AI Systems to Trace and Document Bias in ML Pipelines

Russo, Mayra; Chudasama, Yasharajsinh; Purohit, Disha; Sawischa, Sammy; Vidal, Maria-Esther

Employing Hybrid AI Systems to Trace and Document Bias in ML Pipelines

dc.bibliographicCitation.firstPage	96821
dc.bibliographicCitation.journalTitle	IEEE Access
dc.bibliographicCitation.lastPage	96847
dc.bibliographicCitation.volume	12
dc.contributor.author	Russo, Mayra
dc.contributor.author	Chudasama, Yasharajsinh
dc.contributor.author	Purohit, Disha
dc.contributor.author	Sawischa, Sammy
dc.contributor.author	Vidal, Maria-Esther
dc.date.accessioned	2025-02-26T09:28:31Z
dc.date.available	2025-02-26T09:28:31Z
dc.date.issued	2024
dc.description.abstract	Artificial Intelligence (AI) systems can introduce biases that lead to unreliable outcomes and, in the worst-case scenarios, perpetuate systemic and discriminatory results when deployed in the real world. While significant efforts have been made to create bias detection methods, developing reliable and comprehensive documentation artifacts also makes for valuable resources that address bias and aid in minimizing the harms associated with AI systems. Based on compositional design patterns, this paper introduces a documentation approach using a hybrid AI system to prompt the identification and traceability of bias in datasets and predictive AI models. To demonstrate the effectiveness of our approach, we instantiate our pattern in two implementations of a hybrid AI system. One follows an integrated approach and performs fine-grained tracing and documentation of the AI model. In contrast, the other hybrid system follows a principled approach and enables the documentation and comparison of bias in the input data and the predictions generated by the model. Through a use-case based on Fake News detection and an empirical evaluation, we show how biases detected during data ingestion steps (e.g., label, over-representation, activity bias) affect the training and predictions of the classification models. Concretely, we report a stark skewness in the distribution of input variables towards the Fake News label, we uncover how a predictive variable leads to more constraints in the learning process, and highlight open challenges of training models with unbalanced datasets. A video summarizing this work is available online (https://youtu.be/v2GfIQPAy_4?si=BXtWOf97cLiZavyu),and the implementation is publicly available on GitHub (https://github.com/SDM-TIB/DocBiasKG).	eng
dc.description.fonds	TIB_Fonds
dc.description.version	publishedVersion	eng
dc.identifier.uri	https://oa.tib.eu/renate/handle/123456789/18570
dc.identifier.uri	https://doi.org/10.34657/17589
dc.language.iso	eng
dc.publisher	New York, NY : IEEE
dc.relation.doi	https://doi.org/10.1109/access.2024.3427388
dc.relation.essn	2169-3536
dc.rights.license	CC BY 4.0 Unported
dc.rights.uri	https://creativecommons.org/licenses/by/4.0
dc.subject.ddc	004
dc.subject.ddc	621,3
dc.subject.other	Bias	eng
dc.subject.other	hybrid AI systems	eng
dc.subject.other	knowledge graphs	eng
dc.subject.other	tracing	eng
dc.title	Employing Hybrid AI Systems to Trace and Document Bias in ML Pipelines	eng
dc.type	Article
dc.type	Text
tib.accessRights	openAccess
wgl.contributor	TIB

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Employing_Hybrid_AI_Systems_to_Trace_and_Document_Bias_in_ML_Pipelines.pdf
Size:: 7.78 MB
Format:: Adobe Portable Document Format
Description:

Download

Collections

Informatik
Ingenieurwissenschaften