class: middle center hide-slide-number monash-bg-gray80 .info-box.w-50.bg-white[ These slides are viewed best by Chrome or Firefox and occasionally need to be refreshed if elements did not load properly. See <a href=lecture-03a.pdf>here for the PDF <i class="fas fa-file-pdf"></i></a>. ] <br> .white[Press the **right arrow** to progress to the next slide!] --- class: title-slide count: false background-image: url("images/bg-02.png") # .monash-blue[ETC3250/5250: Introduction to Machine Learning] <h1 class="monash-blue" style="font-size: 30pt!important;"></h1> <br> <h2 style="font-weight:900!important;">Categorical response regression</h2> .bottom_abs.width100[ Lecturer: *Professor Di Cook* Department of Econometrics and Business Statistics <i class="fas fa-envelope"></i> ETC3250.Clayton-x@monash.edu <i class="fas fa-calendar-alt"></i> Week 3a <br> ] --- # Categorical responses In **classification**, the output `\(Y\)` is a .monash-orange2[categorical variable]. For example, - Loan approval: `\(Y \in \{\mbox{successful}, \mbox{unsuccessful}\}\)` - Type of business culture: `\(Y \in \{\mbox{clan}, \mbox{adhocracy}, \mbox{market}, \mbox{hierarchical}\}\)` - Historical document author: `\(Y \in \{\mbox{Austen}, \mbox{Dickens}, \mbox{Imitator}\}\)` - Email: `\(Y \in \{\mbox{spam}, \mbox{ham}\}\)` Map the categories to a numeric variable, or possibly a binary matrix. <!-- - A person arrives at the emergency room with a set of symptoms that could possibly be attributed to one of three medical conditions. Which of the three conditions does the individual have? - An online banking service must be able to determine whether or not a transaction being performed on the site is fraudulent, on the basis of the user's IP address, past transaction history, and so forth. - On the basis of DNA sequence data for a number of patients with and without a given disease, a biologist would like to figure out which DNA mutations are deleterious (disease-causing) and which are not. - An email comes into the server. Should it be moved into the inbox or the junk mail box, based on header text, sender, origin, time of day, ...? --> --- # When linear regression is not appropriate .flex[ .w-40[ <br> Consider the following data `simcredit` in the ISLR R package (textbook) which looks at the default status based on credit balance. <br><br> .question-box[Why is a linear model less than ideal for this data?] ] .w-10[ .white[this is just space] ] .w-45[ <br> <img src="images/lecture-03a/unnamed-chunk-3-1.png" width="100%" style="display: block; margin: auto;" /> ] ] --- # Modelling binary responses .flex[ .w-45[ <img src="images/lecture-03a/unnamed-chunk-4-1.png" width="100%" style="display: block; margin: auto;" /> .monash-orange2[Orange] line is a loess smooth of the data. It's much better than the linear fit. ] .w-45[ <br> - To model **binary data**, we need to .monash-orange2[link] our **predictors** to our response using a *link function*. Another way to think about it is that we will transform `\(Y\)`, to convert it to a proportion, and then build the linear model on the transformed response. - There are many different types of link functions we could use, but for a binary response we typically use the .monash-orange2[logistic] link function. ] .w-10[ .white[this is just space] ] ] --- # The logistic function .flex[ .w-40[ Instead of predicting the outcome directly, we instead predict the probability of being class 1, given the (linear combination of) predictors, using the .monash-orange2[logistic] link function. $$ p(y=1|\beta_0 + \beta_1 x) = f(\beta_0 + \beta_1 x) $$ where `$$f(\beta_0 + \beta_1 x) = \frac{e^{\beta_0+\beta_1x}}{1+e^{\beta_0+\beta_1x}}$$` ] .w-10[ .white[this is just space] ] .w-45[ <img src="images/lecture-03a/unnamed-chunk-5-1.png" width="100%" style="display: block; margin: auto;" /> ] ] --- .flex[ .w-40[ # Logistic function Transform the function: `$$~~~~y = \frac{e^{\beta_0+\beta_1x}}{1+e^{\beta_0+\beta_1x}}$$` `\(\longrightarrow y = \frac{1}{1/e^{\beta_0+\beta_1x}+1}\)` `\(\longrightarrow 1/y = 1/e^{\beta_0+\beta_1x}+1\)` `\(\longrightarrow 1/y - 1 = 1/e^{\beta_0+\beta_1x}\)` `\(\longrightarrow \frac{1}{1/y - 1} = e^{\beta_0+\beta_1x}\)` `\(\longrightarrow \frac{y}{1 - y} = e^{\beta_0+\beta_1x}\)` `\(\longrightarrow \log_e\frac{y}{1 - y} = \beta_0+\beta_1x\)` ] .w-10[ .white[this is just space] ] ] --- count: false .flex[ .w-40[ # Logistic function Transform the function: `$$~~~~y = \frac{e^{\beta_0+\beta_1x}}{1+e^{\beta_0+\beta_1x}}$$` `\(\longrightarrow y = \frac{1}{1/e^{\beta_0+\beta_1x}+1}\)` `\(\longrightarrow 1/y = 1/e^{\beta_0+\beta_1x}+1\)` `\(\longrightarrow 1/y - 1 = 1/e^{\beta_0+\beta_1x}\)` `\(\longrightarrow \frac{1}{1/y - 1} = e^{\beta_0+\beta_1x}\)` `\(\longrightarrow \frac{y}{1 - y} = e^{\beta_0+\beta_1x}\)` `\(\longrightarrow \log_e\frac{y}{1 - y} = \beta_0+\beta_1x\)` ] .w-10[ .white[this is just space] ] .w-40[ <br> <br> <br> <br> Transforming the response `\(\log_e\frac{y}{1 - y}\)` makes it possible to use a linear model fit. <br> <br> <br> <br> .info-box[The left-hand side, `\(\log_e\frac{y}{1 - y}\)`, is known as the .monash-orange2[log-odds ratio] or logit. <i class="fas fa-dice" style="color: #D93F00"></i> ] ]] --- # The logistic regression model The fitted model, where `\(P(Y=0|X) = 1 - P(Y=1|X)\)`, is then written as: <center> .info-box[ `\(\log_e\frac{P(Y=1|X)}{1 - P(Y=1|X)} = \beta_0+\beta_1X\)` ] </center> <br><br> When there are .monash-blue2[more than two] categories: - the formula can be extended, using dummy variables. - follows from the above, extended to provide probabilities for each level/category, and the last category is 1-sum of the probabilities of other categories. - the sum of all probabilities has to be 1. --- # Interpretation - .monash-blue2[**Linear regression**] - `\(\beta_1\)` gives the average change in `\(Y\)` associated with a one-unit increase in `\(X\)` -- - .monash-blue2[**Logistic regression**] - Because the model is not linear in `\(X\)`, `\(\beta_1\)` does not correspond to the change in response associated with a one-unit increase in `\(X\)`. - However, increasing `\(X\)` by one unit changes the log odds by `\(\beta_1\)`, or equivalently it multiplies the odds by `\(e^{\beta_1}\)` --- # Maximum Likelihood Estimation Given the logistic `\(p(x_i) = \frac{1}{e^{-(\beta_0+\beta_1x_i)}+1}\)` choose parameters `\(\beta_0, \beta_1\)` to maximize the likelihood: `$$\mathcal{l}_n(\beta_0, \beta_1) = \prod_{i=1}^n p(x_i)^{y_i}(1-p(x_i))^{1-y_i}.$$` It is more convenient to maximize the *log-likelihood*: `\begin{align*} \log l_n(\beta_0, \beta_1) &= \sum_{i = 1}^n \big( y_i\log p(x_i) + (1-y_i)\log(1-p(x_i))\big)\\ &= \sum_{i=1}^n\big(y_i(\beta_0+\beta_1x_i)-\log{(1+e^{\beta_0+\beta_1x_i})}\big) \end{align*}` --- # Making predictions .flex[ .w-45[ With estimates from the model fit, `\(\hat{\beta}_0, \hat{\beta}_1\)`, we can predict the **probability of belonging to class 1** using: `$$p(y=1|\hat{\beta}_0 + \hat{\beta}_1 x) = \frac{e^{\hat{\beta}_0+ \hat{\beta}_1x}}{1+e^{\hat{\beta}_0+ \hat{\beta}_1x}}$$` <br> - .monash-orange2[Round to 0 or 1] for class prediction. - Residual could be calculated as the difference between observed and predicted. Mostly, the misclassification (correct or incorrect) is used to assess the model fit. ] .w-45[ <img src="images/lecture-03a/unnamed-chunk-6-1.png" width="100%" style="display: block; margin: auto;" /> .monash-orange2[Orange points] are fitted values, `\(\hat{y}_i\)`. Black points are observed response, `\(y_i\)` (either 0 or 1). ] ] --- .flex[ .w-45[ ## Fitting credit data in R <i class="fas fa-credit-card" style="color: #D93F00"></i> <br> We use the `glm` function in R to fit a logistic regression model. The `glm` function can support many response types, so we specify `family="binomial"` to let R know that our response is *binary*. ] .w-45[ <br><br><br> ```r logistic_mod <- logistic_reg() %>% * set_engine("glm") %>% * set_mode("classification") %>% translate() logistic_fit <- logistic_mod %>% fit(default ~ balance, data = simcredit) ``` ] ] --- # Examine the fit <br> ```r *tidy(logistic_fit) ``` ``` ## # A tibble: 2 × 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) -10.7 0.361 -29.5 3.62e-191 ## 2 balance 0.00550 0.000220 25.0 1.98e-137 ``` ```r *glance(logistic_fit) ``` ``` ## # A tibble: 1 × 8 ## null.deviance df.null logLik AIC BIC deviance df.residual nobs ## <dbl> <int> <dbl> <dbl> <dbl> <dbl> <int> <int> ## 1 2921. 9999 -798. 1600. 1615. 1596. 9998 10000 ``` --- # Write out the model `\(\hat{\beta}_0 =\)` -10.6513306 `\(\hat{\beta}_1 =\)` 0.0054989 <br><br><br> # Model fit summary Null model deviance 2920.6 (think of this as TSS) Model deviance 1596.5 (think of this as RSS) --- # Check model fit ```r *simcredit_fit <- augment(logistic_fit, simcredit) simcredit_fit %>% count(default, .pred_class) %>% pivot_wider(names_from = "default", values_from = n) ``` ``` ## # A tibble: 2 × 3 ## .pred_class No Yes ## <fct> <int> <int> ## 1 No 9625 233 ## 2 Yes 42 100 ``` <br> <br> <center> .info-box[Note: Residuals not typically used.] <center> --- .flex[ .w-45[ # A warning for using GLMs! <br> Logistic regression model fitting fails when the data is *perfectly* separated. MLE fit will try and fit a step-wise function to this graph, pushing coefficients sizes towards infinity and produce large standard errors. .monash-orange2[Pay attention to warnings!] ] .w-45[ <img src="images/lecture-03a/unnamed-chunk-9-1.png" width="100%" style="display: block; margin: auto;" /> ]] ```r logistic_fit <- logistic_mod %>% fit(default_new ~ balance, data = simcredit) ``` ``` ## Warning: glm.fit: algorithm did not converge ``` ``` ## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred ``` --- class: informative middle center # More on supervised classification to come Logistic regression is a technique for supervised classification. We'll see a lot more techniques: linear discriminant analysis, trees, forests, support vector machines, neural networks. --- background-size: cover class: title-slide background-image: url("images/bg-02.png") <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>. .bottom_abs.width100[ Lecturer: *Professor Di Cook* Department of Econometrics and Business Statistics <i class="fas fa-envelope"></i> ETC3250.Clayton-x@monash.edu <i class="fas fa-calendar-alt"></i> Week 3a <br> ]