.font_smaller2[(Chapter3/3.1.pdf)] ] .w-45[ The model describes a .monash-orange2[line, **plane** or hyperplane] in the predictor space.

.font_smaller2[(Chapter3/3.5.pdf)] ] ] --- # Categorical Variables Qualitative variables need to be converted to numeric, by making a set of dummy variables. $$x_i = \left\{\begin{array} {ll} 1 & \mbox{if} ~~~ i^{\text{th}} \mbox{ obs is a koala} \\ 0 & \mbox{otherwise} \end{array}\right\}$$ which would result in the model $$\hat{y}_i = \left\{\begin{array} {ll} \beta_0+\beta_1 & \mbox{if} ~~~ i^{\text{th}} \mbox{ obs is a koala} \\ \beta_0 & \mbox{otherwise} \end{array}\right\}$$ --- # Categorical Variables More than two categories $$x_{i1} = \left\{\begin{array} {ll} 1 & \mbox{if} ~~~ i^{\text{th}} \mbox{ obs is a koala} \\ 0 & \mbox{otherwise} \end{array}\right\}$$ $$x_{i2} = \left\{\begin{array} {ll} 1 & \mbox{if} ~~~ i^{\text{th}} \mbox{ obs is a bilby} \\ 0 & \mbox{otherwise} \end{array}\right\}$$ which would result in the model using .monash-orange2[dummy variables]. $$\hat{y}_i = \left\{\begin{array} {ll} \beta_0+\beta_1 & \mbox{if} ~~~ i^{\text{th}} \mbox{ obs is a koala} \\ \beta_0+\beta_2 & \mbox{if} ~~~ i^{\text{th}} \mbox{ obs is a bilby} \\ \beta_0 & \mbox{otherwise} \end{array}\right\}$$ --- # Interactions are induced by categorical predictors .info-box[When you have a categorical variable, it can be convenient to allow .monash-orange2[BOTH slope and intercept to vary] across category levels. This is called an .monash-orange2[interaction].]

- Is at least one of the predictors useful in predicting the response? - Do all the predictors help to explain $Y$, or is only a subset of the predictors useful? - How well does the model fit the data? - Given a set of predictor values, what response value should we predict and how accurate is our prediction? --- .flex[ .border-box[ # Model fitting .monash-orange2[Least squares] is a common way to fit the model, where $\hat{\beta}_j$ are chosen to minimise $$RSS=\sum_{i=1}^n (y_i-\hat{y}_i)^2$$ The .monash-orange2[smaller the sum] of differences, the .monash-orange2[better] the model fits the data. ] .border-box[ # Goodness-of-fit $R^2$ is the .monash-orange2[proportion of variation] explained by the model, and measures the goodness of the fit, close to 1 the model explains most of the variability in $Y$, close to 0 it explains very little. $$R^2 = 1 - \frac{RSS}{TSS}$$ where $TSS=\sum_{i=1}^n (y_i - \bar{y})^2$. RSS is residual sum of squares, and TSS is total sum of squares. ] .border-box[ # Model Diagnostics .monash-orange2[Residual Standard Error (RSE)] is an estimate of the standard deviation of $\varepsilon$. This is meaningful with the assumption that $\varepsilon \sim N(0, \sigma^2)$. $$RSE = \sqrt{\frac{1}{n-p-1}RSS}$$ This is another way to examine the variation around the model. Unlike $R^2$ it is not on a standard scale. ] ] --- # Maximum Likelihood Estimation and Least Squares If the errors are iid and normally distributed, then $${Y} \sim N({X}{\beta},\sigma^2{I})$$ So the likelihood is $$L = \frac{1}{\sigma^n(2\pi)^{n/2}}\exp\left(-\frac1{2\sigma^2}\sum_{i=1}^n (y_i-\hat{y}_i)^2\right)$$ which is maximized when $\sum_{i=1}^n ({y}-\hat{y})^2$ is minimized.

## Individual variables The strength of relationship between the response and an individual variable can be tested using a $t$-test for the hypothesis: $$H_0: \beta_j=0 \mbox{ vs } H_a: \beta_j\neq 0$$ where the test statistic is $t=\frac{\hat{\beta}_j}{SE({\hat{\beta}_j})}$ --- class: middle center # Interpreting the effect of any predictor > We interpret $\beta_j$ as the average effect on $Y$ of a one unit increase in $X_j$, holding all other predictors fixed.

.info-box[`r emo::ji("warning")` This is association and .monash-orange2[not causation]. ] --- # Assessing model fit using residuals .flex[ .border-box[ - If a plot of the residuals vs any predictor in the model shows a pattern, then the .monash-orange2[relationship is nonlinear.] - If a plot of the residuals vs any predictor **not** in the model shows a pattern, then .monash-orange2[the predictor should be added to the model.] - If a plot of the residuals vs fitted values shows a pattern, then there is .monash-orange2[heteroscedasticity in the errors]. (Try a transformation, but may not fix.) ] .border-box[