asked 44.2k views
5 votes
Assume two data sets sampled from the same distrubution where the number of observations for each datasets is 5,000 and 100,000 respectively. Randomly construct the train and test sets by dividing the data 80:20 A) Draw two curves for training error and test each data set with y-axis denoting the error and x-axis denoting the model complexity. B) You should have a total of 4 curves: one training error curve and one test error curve for each dataset C) Draw all 4 of them in the same diagram D) Clearly mark all curves and justify them

asked
User Mikec
by
7.4k points

1 Answer

3 votes

Final answer:

The question asks to plot four error curves (two training, two testing) for two data sets of different sizes. These curves would be based on the model complexity, showing a typical U-pattern due to the balance between bias and variance. Plot all curves on the same diagram and justify them based on the model's behavior.

Step-by-step explanation:

Assuming two data sets (5,000 and 100,000 observations respectively) sampled from the same distribution, you are asked to construct random training and test sets using an 80:20 split (80% training data, 20% test data). The question also instructs to plot training and test error curves over model complexity, yielding a total of four curves (two for each data set).

Model complexity typically affects the balance between bias and variance in your models. With increased complexity, you may see a decline in training error due to improved learning from the dataset but an increase in testing error due to overfitting. This often results in a U-shaped curve for both the training and testing data.

Consequently, you would expect four curves. For each dataset, the training error curve would start high, decrease as model complexity increases, and plateau as the model overfits data. The test error curve would similarly start high, decrease to a certain point (optimal model complexity), and then start to increase due to overfitting.

Plot all four curves on the same diagram for comparison, clearly marking each one. Justify the curves based on the reasoning stated above.

Learn more about Model Complexity and Error Curves

answered
User Icaksama
by
8.5k points