The purpose of this lab is to

- learn to use the tour to develop intuition about multiple dimensions
- understand homogeneous vs heterogeneous variance-covariance
- recognise features in high dimensions including multivariate outliers, clustering and linear and nonlinear dependence
- practice simulating data from standard multivariate distributions

This part is replicating the plots made in the class notes. For each example, run the code from the class notes, and discuss with your group members what you might learn about the data that is different from the LDA and PCA conducted in earlier labs/lecture notes.

- Compute the means, standard deviations and correlation for the datasaurus dozen, and check that they are indeed all the same.
- Run a 2D projection grand tour for
- the 6D flea data
- the 7D womens track data

- Run a 2D guided tour
- using the holes index for the 6D flea data
- using the lda_pp index for the 6D flea data, using species as the class
- using the cmass index for the 7D womens track data

- Simulate data from a 4D multivariate normal, with three groups, with these features \(\mu_1 = (0,0,3,0)', \mu_2 = (0,3,-3,0)', \mu_3 = (-3,0,3,3)'\), \(n_1 = 85, n_2 = 104, n_3 = 48\)

where **set A** has equal variance-covariance between groups, \(\Sigma\):

\[\Sigma = \begin{bmatrix} 3.0&0.2&-1.2&0.9\\ 0.2&2.5&-1.4&0.3\\ -1.2&-1.4&2.0&1.0\\ 0.9&0.3&1.0&3.0\\ \end{bmatrix}\]

and **set B** has different variance-covariances between groups, \(\Sigma_1, \Sigma_2, \Sigma_3\):

\(\Sigma_1 = \Sigma\)

\[\Sigma_2 = \begin{bmatrix}3.0&-0.8&1.2&0.3\\ -0.8&2.5&1.4&0.3\\ 1.2&1.4&2.0&1.0\\ 0.3&0.3&1.0&3.0\\ \end{bmatrix}\]

\[\Sigma_3 = \begin{bmatrix}2.0&-1.0&1.2&0.3\\ -1.0&2.5&1.4&0.3\\ 1.2&1.4&4.0&-1.2\\ 0.3&0.3&-1.2&3.0\\ \end{bmatrix}\]

Conduct LDA on the two data sets, and plot the data into the 2D linear discriminant space.

View both data sets in a grand tour, where the points are coloured by the class variable. Write a paragraph in your own (or group’s) words what the difference between homogeneous and heterogeneous variance-covariance.