asked 74.3k views
5 votes
The dataset on American college and university rankings (available from www.dataminingbook.com) contains information on 1302 American colleges and universities offering an undergraduate program. For each university, there are 17 measurements that include continuous measurements (such as tuition and graduation rate) and categorical measurements (such as location by state and whether it is a private or a public school).a. Remove all categorical variables. Then remove all records with missing numerical measurements from the dataset.b. Conduct a principal components analysis on the cleaned data and comment on the results. Should the data be normalized? Discuss what characterizes the components you consider key.

1 Answer

2 votes

Final answer:

Normalize the data before conducting PCA to allow each variable to contribute equally. The key components can reveal underlying structures in the dataset. For a new community college, the mode could be more practical than the mean as it reflects the most common enrollment size.

Step-by-step explanation:

When conducting principal components analysis (PCA) after removing all categorical variables and records with missing numerical measurements from the dataset, it is important to normalize the data if the measurements are on different scales, to ensure that each variable contributes equally to the analysis. PCA identifies the key dimensions of variation, and the components that explain the most variance are considered key. These principal components can reveal underlying structures in the data, such as clusters of similar colleges.

If you are compiling a frequency table and constructing a histogram for the enrollment data of American colleges and universities, you will typically divide the data into intervals of equal width, count the number of instances in each interval, and then visualize this distribution with a histogram. This aids in understanding the enrollment sizes at these institutions.

Regarding a new community college, the mode might be more practical as it indicates the most common enrollment size among existing colleges, which can be an indicator of what might be expected or standard in the community, while the mean provides the average size but can be skewed by extreme values.

answered
User Samthebest
by
7.4k points
Welcome to Qamnty — a place to ask, share, and grow together. Join our community and get real answers from real people.