An observation can then be written as $$\mathcal{D} = \{(y_i, x_i)\}_{i = 1}^n = \{(y_1, x_1), (y_2, x_2), \dots, (y_n, x_n)\}$$ where $x_i$ is a vector with $p$ elements. --- # Notation 4/6 A transposed data matrix is denoted as \begin{align*} {\mathbf X}^T_{p\times n} = \left(\begin{array}{cccc} x_{11} & x_{21} & \dots & x_{n1} \\ x_{12} & x_{22} & \dots & x_{n2} \\ \vdots & \vdots & \ddots & \vdots \\ x_{1p} & x_{2p} & \dots & x_{np} \end{array} \right) \end{align*} and \begin{align*} x^T_i = \left(\begin{array}{cccc} x_{i1} & x_{i2} & \dots & x_{ip} \\ \end{array} \right) \end{align*} --- # Notation 5/6 If ${\mathbf y}$ is categorical, with $K$ levels, it can be useful to write it as a binary matrix \begin{align*} {\mathbf Y}_{n\times K} = \left(\begin{array}{cccc} 1 & 0 & \dots & 0 \\ \vdots & \vdots & & \vdots \\ 1 & 0 & & 0 \\ 0 & 1 & \dots & 0 \\ \vdots & \vdots & & \vdots \\ 0 & 1 & \dots & 0 \\ 0 & 0 & \dots & 1 \\ \vdots & \vdots & & \vdots \\ 0 & 0 & \dots & 1 \\ \end{array} \right) \end{align*} --- # Matrix multiplication .grid[ .column[ Suppose that \begin{align*} {\mathbf A}_{2\times 3} = \left(\begin{array}{ccc} 1 & 2 & 3 \\ 4 & 5 & 6 \\ \end{array} \right) \end{align*} \begin{align*} {\mathbf B}_{3\times 4} = \left(\begin{array}{cccc} -1 & -2 & -3 & -4\\ -5 & -6 & -7 & -8\\ -9 & -10 & -11 & -12\\ \end{array} \right) \end{align*} then \begin{align*} {\mathbf A}{\mathbf B}_{2\times 4} = \left(\begin{array}{cccc} -38 & -44 & -50 & -56\\ -83 & -98 & -113 & -128\\ \end{array} \right) \end{align*} *Pour the rows into the columns.* Note: You can't do ${\mathbf B}{\mathbf A}$! ] .column[ Using R as a matrix calculator ```{r echo=T, eval=FALSE} a <- matrix(c(1,2,3,4,5,6), ncol=3, byrow=T) b <- -1*matrix(c(1,2,3,4,5,6, 7,8,9,10,11,12), ncol=4,byrow=T) a%*%b ``` ] ] --- # Inverting a matrix Suppose that ${\mathbf A}$ is square \begin{align*} {\mathbf A}_{2\times 2} = \left(\begin{array}{cc} a & b \\ c & d \\ \end{array} \right) \end{align*} then the inverse is (if $ad-bc \neq 0$) \begin{align*} {\mathbf A}^{-1}_{2\times 2} = \frac{1}{ad-bc} \left(\begin{array}{cc} d & -b \\ -c & a \\ \end{array} \right) \end{align*} and ${\mathbf A}{\mathbf A}^{-1} = I$ where \begin{align*} {\mathbf I}_{2\times 2} = \left(\begin{array}{cc} 1 & 0 \\ 0 & 1 \\ \end{array} \right) \end{align*} --- # Notation 6/6 $d (\leq p)$ is used to denote the number of variables in a lower dimensional space, usually by taking a projection. $A$ is a $p\times d$ orthonormal basis, $A^TA=I_d$ ( $A'A=I_d$ ). The projection of ${\mathbf x_i}$ onto $A$ is $A^T{\mathbf x}_i$. --- # Different types of learning .grid[ 1. Supervised learning: $y_i$ is .monash-orange2[available] for all $x_i$ - Regression: quantitative $y_i$ - Classification: categorical $y_i$

2. Unsupervised learning: $y_i$ .monash-orange2[unavailable] for all $x_i$

3. Semi-supervised learning: $y_i$ available for some $x_i$ (not covered in this unit)

.info-box[Being able to recognise the type of problem is an important skill.] ```{r fig.width=6, fig.height=6, fig.align='center'} library(tidyverse) library(gapminder) library(gridExtra) p1 <- gapminder %>% filter(country == "Australia") %>% ggplot(aes(x=year, y=lifeExp)) + geom_point() + geom_smooth() + xlab("predictor") + ylab("response") + ggtitle("Regression") + theme(aspect.ratio=1) flea <- read_csv("http://www.ggobi.org/book/data/flea.csv") p2 <- ggplot(flea, aes(x=tars1, y=aede1, colour = species)) + geom_point() + scale_colour_brewer(palette = "Dark2") + xlab("Var 1") + ylab("Var 2") + ggtitle("Classification") + theme(aspect.ratio=1, legend.position="None") p3 <- ggplot(flea, aes(x=tars1, y=aede1)) + geom_point() + xlab("Var 1") + ylab("Var 2") + ggtitle("Clustering") + theme(aspect.ratio=1) grid.arrange(p1, p2, p3, ncol=2) ``` ] --- # What type of problem is this? (1/3) Food servers' tips in restaurants may be influenced by many factors, including the nature of the restaurant, size of the party, and table locations in the restaurant. Restaurant managers need to know which factors matter when they assign tables to food servers. For the sake of staff morale, they usually want to avoid either the substance or the appearance of unfair treatment of the servers, for whom tips (at least in restaurants in the United States) are a major component of pay. In one restaurant, a food server recorded the following data on all customers they served during an interval of two and a half months in early 1990. The restaurant, located in a suburban shopping mall, was part of a national chain and served a varied menu. In observance of local law the restaurant offered seating in a non-smoking section to patrons who requested it. Each record includes a day and time, and taken together, they show the server's work schedule. --- # What type of problem is this? (2/3) Measurements on rock crabs of the genus *Leptograpsus*. One species *L. variegatus* had been split into two new species, previously grouped by color, orange and blue. Preserved specimens lose their color, so it was hoped that morphological differences would enable museum specimens to be classified. There are 50 specimens of each sex of each species, collected on site at Fremantle, Western Australia. For each specimen, five measurements were made, using vernier calipers. --- # What type of problem is this? (3/3) This data contains observations taken from a high-energy particle physics scattering experiment that yielded four particles. The reaction $\pi_b^+p_t\rightarrow p\pi_1^+\pi_2^+\pi^-$ can be described completely by seven independent measurements. Below, $\mu^2(A,B,\pm C)=(E_A+E_B\pm E_C)^2-(P_A+P_B\pm P_C)^2$ and $\mu^2(A,\pm B)=(E_A\pm E_B)^2-(P_A\pm P_B)^2$, where $E$ and $P$ represent the particle's energy and momentum, respectively, as measured in billions of electron volts. The notation $(p)^2$ represents the inner product $P/P$. The ordinal assignment of the two $\pi^+$'s was done randomly. What are the clusters in the data? --- ```{r endslide, child="assets/endslide.Rmd"} ```