Collaborative Data Cleaning Framework: a Pilot Case Study for Machine Learning Development
DOI:
https://doi.org/10.2218/ijdc.v18i1.924Abstract
This study experiments with collaborative data cleaning, a pivotal phase in data preparation for both analysis and machine learning. We used a provenance Data Cleaning Model (DCM) for multi-user scenarios to track changes on a dataset and conduct comprehensive experiments that simulate multiple data curators working collaboratively on a dataset. Furthermore, we analyzed how different data-cleaning scenarios to improve quality metrics of completeness and correctness of a dataset can affect the downstream machine learning modeling performance.
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Nikolaus Parulian, Bertram Ludäscher
This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright for papers and articles published in this journal is retained by the authors, with first publication rights granted to the University of Edinburgh. It is a condition of publication that authors license their paper or article under a Creative Commons Attribution 4.0 International (CC BY 4.0) licence.