Final answer:
The question asks to plot four error curves (two training, two testing) for two data sets of different sizes. These curves would be based on the model complexity, showing a typical U-pattern due to the balance between bias and variance. Plot all curves on the same diagram and justify them based on the model's behavior.
Step-by-step explanation:
Assuming two data sets (5,000 and 100,000 observations respectively) sampled from the same distribution, you are asked to construct random training and test sets using an 80:20 split (80% training data, 20% test data). The question also instructs to plot training and test error curves over model complexity, yielding a total of four curves (two for each data set).
Model complexity typically affects the balance between bias and variance in your models. With increased complexity, you may see a decline in training error due to improved learning from the dataset but an increase in testing error due to overfitting. This often results in a U-shaped curve for both the training and testing data.
Consequently, you would expect four curves. For each dataset, the training error curve would start high, decrease as model complexity increases, and plateau as the model overfits data. The test error curve would similarly start high, decrease to a certain point (optimal model complexity), and then start to increase due to overfitting.
Plot all four curves on the same diagram for comparison, clearly marking each one. Justify the curves based on the reasoning stated above.
Learn more about Model Complexity and Error Curves