Employing Hybrid AI Systems to Trace and Document Bias in ML Pipelines

dc.bibliographicCitation.firstPage96821
dc.bibliographicCitation.journalTitleIEEE Access
dc.bibliographicCitation.lastPage96847
dc.bibliographicCitation.volume12
dc.contributor.authorRusso, Mayra
dc.contributor.authorChudasama, Yasharajsinh
dc.contributor.authorPurohit, Disha
dc.contributor.authorSawischa, Sammy
dc.contributor.authorVidal, Maria-Esther
dc.date.accessioned2025-02-26T09:28:31Z
dc.date.available2025-02-26T09:28:31Z
dc.date.issued2024
dc.description.abstractArtificial Intelligence (AI) systems can introduce biases that lead to unreliable outcomes and, in the worst-case scenarios, perpetuate systemic and discriminatory results when deployed in the real world. While significant efforts have been made to create bias detection methods, developing reliable and comprehensive documentation artifacts also makes for valuable resources that address bias and aid in minimizing the harms associated with AI systems. Based on compositional design patterns, this paper introduces a documentation approach using a hybrid AI system to prompt the identification and traceability of bias in datasets and predictive AI models. To demonstrate the effectiveness of our approach, we instantiate our pattern in two implementations of a hybrid AI system. One follows an integrated approach and performs fine-grained tracing and documentation of the AI model. In contrast, the other hybrid system follows a principled approach and enables the documentation and comparison of bias in the input data and the predictions generated by the model. Through a use-case based on Fake News detection and an empirical evaluation, we show how biases detected during data ingestion steps (e.g., label, over-representation, activity bias) affect the training and predictions of the classification models. Concretely, we report a stark skewness in the distribution of input variables towards the Fake News label, we uncover how a predictive variable leads to more constraints in the learning process, and highlight open challenges of training models with unbalanced datasets. A video summarizing this work is available online (https://youtu.be/v2GfIQPAy_4?si=BXtWOf97cLiZavyu),and the implementation is publicly available on GitHub (https://github.com/SDM-TIB/DocBiasKG).eng
dc.description.fondsTIB_Fonds
dc.description.versionpublishedVersioneng
dc.identifier.urihttps://oa.tib.eu/renate/handle/123456789/18570
dc.identifier.urihttps://doi.org/10.34657/17589
dc.language.isoeng
dc.publisherNew York, NY : IEEE
dc.relation.doihttps://doi.org/10.1109/access.2024.3427388
dc.relation.essn2169-3536
dc.rights.licenseCC BY 4.0 Unported
dc.rights.urihttps://creativecommons.org/licenses/by/4.0
dc.subject.ddc004
dc.subject.ddc621,3
dc.subject.otherBiaseng
dc.subject.otherhybrid AI systemseng
dc.subject.otherknowledge graphseng
dc.subject.othertracingeng
dc.titleEmploying Hybrid AI Systems to Trace and Document Bias in ML Pipelineseng
dc.typeArticle
dc.typeText
tib.accessRightsopenAccess
wgl.contributorTIB
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Employing_Hybrid_AI_Systems_to_Trace_and_Document_Bias_in_ML_Pipelines.pdf
Size:
7.78 MB
Format:
Adobe Portable Document Format
Description: