# Load libraries used everywherelibrary(tidyverse)library(tidymodels)library(patchwork)library(mulgar)library(GGally)library(tourr)library(geozoo)library(keras)library(uwot)library(colorspace)library(ggthemes)library(conflicted)conflicts_prefer(dplyr::filter)conflicts_prefer(dplyr::select)conflicts_prefer(dplyr::slice)
🎯 Objectives
The goal for this week is learn to fit, diagnose, and predict from a neural network model.
🔧 Preparation
Make sure you have all the necessary libraries installed. There are a few new ones this week!
Exercises:
Open your project for this unit called iml.Rproj. We will be working through the tutorial at TensorFlow for R for fitting and predicting the fashion MNIST image data.
1. Get the data
We use the Fashion MNIST dataset which contains 70,000 grayscale images in 10 categories of articles sold on Zalando’s multi-brand, digital platform for fashion, beauty, and lifestyle.
# download the datafashion_mnist <-dataset_fashion_mnist()# split into input variables and responsec(train_images, train_labels) %<-% fashion_mnist$trainc(test_images, test_labels) %<-% fashion_mnist$test# for interpretation we also define the category namesclass_names =c('T-shirt/top','Trouser','Pullover','Dress','Coat','Sandal','Shirt','Sneaker','Bag','Ankle boot')
2. What’s in the data?
Check how many observations are in the training and test sets, and plot some of the images.
dim(train_images)dim(train_labels)dim(test_images)dim(test_labels)# Choose an image randomlyimg <-as.data.frame(train_images[sample(1:60000, 1), , ])colnames(img) <-seq_len(ncol(img))img$y <-seq_len(nrow(img))img <- img |>pivot_longer(cols =-y,names_to="x", values_to="value") |>mutate(x =as.integer(x))ggplot(img, aes(x = x, y = y, fill = value)) +geom_tile() +scale_fill_gradient(low ="white", high ="black", na.value =NA) +scale_y_reverse() +theme_map() +theme(legend.position ="none")
3. Pre-process the data
It may not be necessary, says Patrick, but we’ll scale the data to 0-1, before modeling.
one hidden layer with 128 nodes with (rectified) linear activation
final layer with 10 nodes and logistic activation
Why 10 nodes in the last layer? Why 128 nodes in the hidden layer?
model_fashion_mnist <-keras_model_sequential()model_fashion_mnist |># flatten the image data into a long vectorlayer_flatten(input_shape =c(28, 28)) |># hidden layer with 128 unitslayer_dense(units =128, activation ='relu') |># output layer for 10 categorieslayer_dense(units =10, activation ='softmax')
Set the optimizer to be adam, loss function to be sparse_categorical_crossentropy and accuracy as the metric. What other optimizers could be used? What is the sparse_catgorical_crossentropy?
This section is motivated by the examples in Cook and Laa (2024). Focus on the test data to investigate the fit, and lack of fit.
PCA can be used to reduce the dimension down from 784, to a small number of PCS, to examine the nature of differences between the classes. Compute the scree plot to decide on a reasonable number that can be examined in a tour. Plot the first two statically. Explain how the class structure matches any clustering.
Interestingly, the nodes in the hidden layer can be thought of as 128 new variables which are linear combinations of the original 784 variables. This is too many to visualise but we can again use PCA to reduce their dimension again, and make plots.
Last task is to explain on what was learned from the confusion matrix to examine the uncertainty in predictions from the predictive probabilities. Because there are 10 classes, these will fall in a 9D simplex. Each vertex is the spot where the model is completely certain about the prediction. Points along an edge indicate confusion only between two classes. Points on a triangular face indicate confusion between three classes. The code below will create the visualisation of the predictive probabilities, focusing on four of the 10 classes to make it a little simpler to digest.