The goal for this week is for you to practice resampling methods, in order to tune models, assess model variance, and determine importance of variables.
🔧 Preparation
Complete the quiz
Do the reading related to week 3
Exercises:
Open your project for this unit called iml.Rproj.
1. Assess the significance of PC coefficients using bootstrap
In the lecture, we used bootstrap to examine the significance of the coefficients for the second principal component from the womens’ track PCA. Do this computation for PC1. The question for you to answer is: Can we consider all of the coefficients to be equal?
2. Using simulation to assess results when there is no structure
The ggscree function in the mulgar package computes PCA on multivariate standard normal samples, to learn what the largest eigenvalue might be when there the covariance between variables is 0.
What is the mean and covariance matrix of a multivariate standard normal distribution?
Simulate a sample of 55 observations from a 7D standard multivariate normal distribution. Compute the sample mean and covariance. (Question: Why 55 observations? Why 7D?)
Compute PCA on your sample, and note the variance of the first PC. How does this compare with variance of the first PC of the women’s track data?
3. Making a lineup plot to assess the dependence between variables
Permutation samples is used to significance assess relationships and importance of variables. Here we will use it to assess the strength of a non-linear relationship.
Generate a sample of data that has a strong non-linear relationship but no correlation, as follows: