ETC3250/5250 Tutorial 9

Support vector machines and regularisation

Author

Prof. Di Cook

Published

29 April 2024

Load the libraries and avoid conflicts, and prepare data
# Load libraries used everywhere
library(tidyverse)
library(tidymodels)
library(patchwork)
library(mulgar)
library(tourr)
library(geozoo)
library(colorspace)
library(ggthemes)
library(conflicted)
conflicts_prefer(dplyr::filter)
conflicts_prefer(dplyr::select)
conflicts_prefer(dplyr::slice)

🎯 Objectives

The goal for this week is learn to about fitting support vector machine models.

🔧 Preparation

  • Make sure you have all the necessary libraries installed.

Exercises:

1. A little algebra

Let \(\mathbf{x_1}\) and \(\mathbf{x_2}\) be vectors in \(\mathbb{R}^2\), that is, two observations where \(p=2\). By expanding \(\mathcal{K}(\mathbf{x_1}, \mathbf{x_2}) = (1 + \langle \mathbf{x_1}, \mathbf{x_2}\rangle) ^2\) show that this is equivalent to an inner product of transformations of the original variables defined as \(\mathbf{y} \in \mathbb{R}^6\).


Remember: \(\langle \mathbf{x_1}, \mathbf{x_2}\rangle =\sum_{j=1}^{p} x_{1j}x_{2j}\).

2. Fitting and examining the support vector machine model

Simulate two toy data sets as follows.

# Toy examples
set.seed(1125)
n1 <- 162
vc1 <- matrix(c(1, -0.7, -0.7, 1), ncol=2, byrow=TRUE)
c1 <- rmvn(n=n1, p=2, mn=c(-2, -2), vc=vc1)
vc2 <- matrix(c(1, -0.4, -0.4, 1)*2, ncol=2, byrow=TRUE)
n2 <- 138
c2 <- rmvn(n=n2, p=2, mn=c(2, 2), vc=vc2)
df1 <- data.frame(x1=mulgar:::scale2(c(c1[,1], c2[,1])), 
                 x2=mulgar:::scale2(c(c1[,2], c2[,2])), 
                 cl = factor(c(rep("A", n1), 
                               rep("B", n2))))
c1 <- sphere.hollow(p=2, n=n1)$points*3 + 
  c(rnorm(n1, sd=0.3), rnorm(n1, sd=0.3))
c2 <- sphere.solid.random(p=2, n=n2)$points
df2 <- data.frame(x1=mulgar:::scale2(c(c1[,1], c2[,1])), 
                  x2=mulgar:::scale2(c(c1[,2], c2[,2])), 
                  cl = factor(c(rep("A", n1), 
                               rep("B", n2))))
  1. Make plots of each data set.
  1. What type of kernel would be appropriate for each? How many support vectors would you expect are needed to define the boundary in each case?
  1. Break the data into training and test.
  1. Fit the svm model. Try changing the cost parameter to explore the number of support vectors used. Choose the value that gives you the smallest number. You can use code like this (scaled = FALSE indicates that the variables are already scaled and no further standardising is needed):
svm_spec1 <- svm_linear(cost=1) |>
  set_mode("classification") |>
  set_engine("kernlab", scaled = FALSE)
svm_fit1 <- svm_spec1 |> 
  fit(cl ~ ., data = df1_tr)
svm_spec2 <- svm_rbf() |>
  set_mode("classification") |>
  set_engine("kernlab", scaled = FALSE)
svm_fit2 <- svm_spec2 |> 
  fit(cl ~ ., data = df2_tr)
  1. Can you use the parameter estimates to write out the equation of the separating hyperplane for the linear SVM model? You can use svm_fit1$fit@coef and svm_fit1$fit@SVindex to compute the coefficients as given by equation on slide 6 of week 8 slides. Try sketching your line on your plot.
  1. Compute the confusion table, and the test error for each model.
  1. Which observations would you expect to be the support vectors? Overlay indications of the support vectors from each model to check whether the model thinks the same as you.
  1. Think about whether a neural network might be able to fit the second data set? Have a discussion with your neighbour and tutor about what would need to be done to make a neural network architecture.x

👋 Finishing up

Make sure you say thanks and good-bye to your tutor. This is a time to also report what you enjoyed and what you found difficult.