Week 6: Neural networks and deep learning
We will cover:
Remember the logistic function:
\[\begin{align} f(x) &= \frac{e^{\beta_0+\sum_{j=1}^p\beta_jx_j}}{1+e^{\beta_0+\sum_{j=1}^p\beta_jx_j}}\\ &= \frac{1}{1+e^{-(\beta_0+\sum_{j=1}^p\beta_jx_j)}} \end{align}\]
Also,
\[\log_e\frac{f(x)}{1 - f(x)} = \beta_0+\sum_{j=1}^p\beta_jx_j\]
Above the threshold predict to be 1.
\[\widehat{y} =b_0+\sum_{j=1}^pb_jx_j\]
Drawing as a network model:
\(p\) inputs (predictors), multiplied by weights (coefficients), summed, add a constant, predicts output (response).
\[\begin{align} \widehat{y} =a_{0}+\sum_{k=1}^s(a_{k}(b_{0k}+\sum_{j=1}^pb_{jk}x_j)) \end{align}\]
The architecture allows for combining multiple linear models to generate non-linear classifications.
The best fit uses \(s=4\), four nodes in the hidden layer. Can you sketch four lines that would split this data well?
The models at each of the nodes of the hidden layer.
These are all the models fitted, using \(s=2, 3, 4\) with the fit statistics.
Fitted using the R package nnet
. It’s very unstable, and this is still a problem with current procedures.
Define architecture
Choose loss function:
Training process:
Evaluation:
Choose 2 nodes, because reducing to 2D, like LDA discriminant space, makes for easy classification.
library(keras)
tensorflow::set_random_seed(211)
# Define model
p_nn_model <- keras_model_sequential()
p_nn_model %>%
layer_dense(units = 2, activation = 'relu',
input_shape = 4) %>%
layer_dense(units = 3, activation = 'softmax')
p_nn_model %>% summary
loss_fn <- loss_sparse_categorical_crossentropy(
from_logits = TRUE)
p_nn_model %>% compile(
optimizer = "adam",
loss = loss_fn,
metrics = c('accuracy')
)
Note that the tidymodels
code style does not allow easy extraction of model coefficients.
Split the data into training and test, and check it.
Fit the model
# Data needs to be matrix, and response needs to be numeric
p_train_x <- p_train %>%
select(bl:bm) %>%
as.matrix()
p_train_y <- p_train %>% pull(species) %>% as.numeric()
p_train_y <- p_train_y-1 # Needs to be 0, 1, 2
p_test_x <- p_test %>%
select(bl:bm) %>%
as.matrix()
p_test_y <- p_test %>% pull(species) %>% as.numeric()
p_test_y <- p_test_y-1 # Needs to be 0, 1, 2
How many parameters need to be estimated?
Four input variables, two nodes in the hidden layer and a three column binary matrix for output. This corresponds to 5+5+3+3+3=19 parameters.
Model: "sequential"
____________________________________________________________
Layer (type) Output Shape Param #
============================================================
dense_1 (Dense) (None, 2) 10
dense (Dense) (None, 3) 9
============================================================
Total params: 19 (76.00 Byte)
Trainable params: 19 (76.00 Byte)
Non-trainable params: 0 (0.00 Byte)
____________________________________________________________
Evaluate the fit
Confusion matrices for training and test
p_train_pred_cat
Adelie Chinstrap Gentoo
Adelie 95 5 0
Chinstrap 0 45 0
Gentoo 1 0 81
p_test_pred_cat
Adelie Chinstrap Gentoo
Adelie 46 3 2
Chinstrap 0 23 0
Gentoo 2 0 39
Note: Specifically have chosen settings so fit is not perfect
Estimated parameters
# Extract hidden layer model weights
p_nn_wgts <- keras::get_weights(p_nn_model, trainable=TRUE)
p_nn_wgts
[[1]]
[,1] [,2]
[1,] 0.62 1.333
[2,] 0.19 -0.016
[3,] -0.17 -0.304
[4,] -0.89 -0.366
[[2]]
[1] 0.127 -0.095
[[3]]
[,1] [,2] [,3]
[1,] -0.16 1.5 -1.92
[2,] -0.75 1.6 0.32
[[4]]
[1] 0.46 -0.94 0.36
Which variables are contributing most to each hidden layer node?
Can you write out the model?
Check the fit at the hidden layer nodes
This is the dimension reduction induced by the model.
Realistically, with a complex neural network, it is too much work to check these nodes.
Examine the predictive probabilities
The problem with this model is that Gentoo are too confused with Adelie. This is a structural problem because from the visualisation of the 4D data we know that there is a big gap between the Gentoo and both other species.
Work your way through the example of fitting the fashion MNIST data using tensorflow.
Hands-on machine learning has a lovely step-by-step guide to constructing and fitting.
This is a very nice slide set: A gentle introduction to deep learning in R using Keras
And the tutorials at TensorFlow for R have lots of examples.
ETC3250/5250 Lecture 6 | iml.numbat.space