???) this time in the
Rmdfile. You will need to write more of your own code from scratch this time. The labs and lecture notes have examples for moost of the code you need.
About the data: The chocolates data was compiled by students in a previous class of Prof Cook, by collecting nutrition information on the chocolates as listed on their internet sites. All numbers were normalised to be equivalent to a 100g serving. Units of measurement are listed in the variable name.
Use the tour, with type of chocolate mapped to colour, and write a paragraph on whether the two types of chocolate differ on the nutritional variables.
Make a parallel coordinate plot of the chocolates, coloured by type, with the variables sorted by how well they separate the groups. Maybe the “uniminmax” scaling might work best for this data. Write a paragraph explaining how the types of chocolates differ in nutritional characteristics.
Identify one dark chocolate that is masquerading as dark, that is, nutritionally looks more like a milk chocolate. Explain your answer.
Fit a linear discriminant analysis model, using equal prior probability for each group.
Write down the LDA rule. Make it clear which type of chocolate is class 1 and class 2 relative to the formula in the notes.
This question is about decision trees. Here is a sample data set to work with:
For each of the simulated data sets provided, using the tour, parallel coordinate plot, scatterplot matrix or any other technique you like, determine the main structure in the data: how many groups there are, whether there are any outliers, overall shape. Write a paragraph on what you find in the data and your approach.