asked 77.1k views
4 votes
In a dataset, you noticed 2 out of 50 feature columns have 60% missing data each. The data collection can take 48 hours but you have to provide a ready-to-analyze training set by tomorrow. Which action would be the most practical for you?

O Remove the rows that have missing values
O Remove the two feature columns
O Fill up the missing values with the average of existing values
O Fill up the missing values with the most frequent value

asked
User Kufi
by
7.6k points

1 Answer

3 votes

Final answer:

The most practical action would be to remove the two feature columns that have 60% missing data each.

Step-by-step explanation:

The most practical action in this situation would be to remove the two feature columns that have 60% missing data each.

By removing these columns, you can still retain a significant portion of your dataset and avoid making assumptions or introducing bias by filling in missing data. Removing the rows with missing values would result in a significant loss of data and might not be ideal if the remaining dataset is too small for analysis.

answered
User Erikdstock
by
8.1k points