asked 131k views
1 vote
Eliminating duplicate data is typically a part of which data preprocessing step? A) Data Consolidation B) Data Cleaning C) Data Transformation

1 Answer

1 vote

Additionally, Data cleaning may involve fixing errors, filling in missing data, removing inconsistencies, and identifying outliers.

Data cleaning is the process of detecting and correcting (or removing) corrupt or inaccurate data from a dataset. It is important to clean the data before using it for further analysis. Duplicate data is an example of dirty data, and it can lead to inaccurate analysis, incorrect conclusions, and bad decision making. Therefore, eliminating duplicate data is a crucial step in data cleaning.

answered
User Riz
by
8.4k points