class: middle center hide-slide-number monash-bg-gray80 .info-box.w-50.bg-white[ These slides are viewed best by Chrome or Firefox and occasionally need to be refreshed if elements did not load properly. See <a href=lecture-02b.pdf>here for the PDF <i class="fas fa-file-pdf"></i></a>. ] <br> .white[Press the **right arrow** to progress to the next slide!] --- class: title-slide count: false background-image: url("images/bg-02.png") # .monash-blue[ETC3250/5250: Introduction to Machine Learning] <h1 class="monash-blue" style="font-size: 30pt!important;"></h1> <br> <h2 style="font-weight:900!important;">Flexible regression</h2> .bottom_abs.width100[ Lecturer: *Professor Di Cook* Department of Econometrics and Business Statistics <i class="fas fa-envelope"></i> ETC3250.Clayton-x@monash.edu <i class="fas fa-calendar-alt"></i> Week 2b <br> ] --- .flex[ .w-45[ # Moving beyond linearity Sometimes the relationships we discover are not linear... <br><br><br><br><br><br><br><br><br><br><br><br><br><br> .font-smaller2[Image source: [XKCD](https://xkcd.com/2048/)] ] .w-10[ ] <img src="images/lecture-02b/curve_fitting.png", width="70%"> ] --- # Moving beyond linearity .flex[ - Consider the following Major League Baseball data from the 1986 and 1987 seasons. - Would a linear model be appropriate for modelling the relationship between Salary and Career hits, captured in the variables `logSalary` and `logCHits`? <img src="images/lecture-02b/unnamed-chunk-3-1.png" width="100%" style="display: block; margin: auto;" /> ] --- .flex[ .w-50[ # Moving beyond linearity - Perhaps a more flexible regression model is needed! - Which of these is a better fit for this data, do you think? <img src="images/lecture-02b/unnamed-chunk-4-1.png" width="100%" style="display: block; margin: auto;" /> ] <img src="images/lecture-02b/unnamed-chunk-5-1.png" width="50%" style="display: block; margin: auto;" /> ] --- # Flexible regression fits The truth is rarely linear, but often the linearity assumption is sufficient and simple. When it's not ... - local regression, sliding window with regression fitted to subsets; - polynomial regression, obtained by raising each of the original predictors to a power; - step functions, cut the range of a predictor into distinct regions; - regression splines, combine polynomials and step functions fit different functions to different subsets of a predictor; - smoothing splines, regression splines plus a smoothness penalty; - .monash-orange2[generalized additive models], extend these approaches to multiple predictors. offer a lot of flexibility, while maintaining the ease and interpretability of linear models. --- .flex[ <img src="images/lecture-02b/loess.png" width="100%"> .w-70[ # Local regression (smoothers) Overlapping subsets of data, (weighted) regression on each subset. Overlap helps to smooth the fitted model. A drawback of this approach is that it does not produce a functional form of the fitted model. ] ] --- # Polynomial regression Although it is simple to add an extra `\(x^2\)` or `\(x^3\)` to the model, it induces a problem of .monash-orange2[collinearity] among predictors. The solution is to use orthogonal polynomials. .flex[ <img src="images/lecture-02b/unnamed-chunk-7-1.png" width="100%" style="display: block; margin: auto;" /> <img src="images/lecture-02b/unnamed-chunk-8-1.png" width="100%" style="display: block; margin: auto;" /> ] --- # Spline regression .flex[ Fit a separate polynomial to different subsets. <center> <img src="images/lecture-02b/splines.png" width="100%"> </center> ] .font_smaller2[[Data Science Deciphered: What is a Spline?](https://towardsdatascience.com/data-science-deciphered-what-is-a-spline-18632bf96646) has a lovely explanation.] --- # Natural splines .flex[ Fit a separate polynomial to different subsets, and constrain the fit at the boundary to be linear. <br><br> Something like this illustration. <center> <img src="images/lecture-02b/splines_natural.png" width="100%"> </center> ] --- ## Natural cubic splines with differing number of knots <img src="images/lecture-02b/unnamed-chunk-9-1.png" width="85%" style="display: block; margin: auto;" /> --- # Comparison between splines and polynomials .flex[ .w-90[ We can fit a polynomial with `poly()`, cubic spline using `splines::bs()`, and fit a natural cubic spline using `splines::ns()`. Notice end of the curves, and the beginning. - Polynomial is fitting `\(x, x^2, \dots, x^{10}\)`. - Spline is fitting degree 3 polynomial with added knots (breaks) for different functions in different subsets. - Natural spline is fitting degree 3 polynomial, and knots with boundary forced to be linear. ] <img src="images/lecture-02b/unnamed-chunk-10-1.png" width="70%" style="display: block; margin: auto;" /> ] --- # Generalised additive models (GAMs) It's really hard to fit a model of the form `$$y = f(x_1, x_2, \dots, x_p) + \varepsilon?$$` - Data is very sparse in high-dimensional space. - Model assumes `\(p\)`-way interactions which are hard to estimate. - Fit the model additively, is simpler, and still flexible, yet interpretable <center> .info-box[ `\(y_i=\beta_0+f_1(x_{i1})+f_2(x_{i2})+...+f_p(x_{ip})+\varepsilon_i\)` where each `\(f\)` is a smooth univariate function. ] </center> --- # Example: Baseball .flex[ .w-50[ Scatterplots of `logSalary` vs predictors, with a loess smoother overlaid. Strong nonlinear relationships with `logCHits` and moderate relationship with `Years`. No relationship with `Assists` and `Errors`. ] .w-50[ <img src="images/lecture-02b/unnamed-chunk-11-1.png" width="100%" style="display: block; margin: auto;" /><img src="images/lecture-02b/unnamed-chunk-11-2.png" width="100%" style="display: block; margin: auto;" /> ] ] --- # Example: Baseball .flex[ .w-50[ Examine the predictors. There should be no strong associations, or outliers or clusters. <br><br> Unfortunately, `logCHits` and `Years` are collinear. ] .w-50[ <img src="images/lecture-02b/unnamed-chunk-12-1.png" width="80%" style="display: block; margin: auto;" /> ] ] --- # Example: Baseball .flex[ .w-50[ `$$\begin{align} \log(\mbox{Salary}) & = \beta_0 + f_1(\mbox{log(CHits)}) \\ & + f_2(\mbox{Years}) + f_3(\mbox{Errors}) \\ & + f_4(\mbox{Assists}) + \varepsilon \end{align}$$` <br> <br> ```r hits_gam <- * mgcv::gam(logSalary ~ s(logCHits) + s(Errors) + s(Assists), data = hits) ``` Estimated smooths from fitted model. (See [Gavin Simpson's explanations](https://gavinsimpson.github.io/gratia/articles/gratia.html).) ] .w-50[ ```r gratia::draw(hits_gam, residuals=TRUE) ``` <img src="images/lecture-02b/unnamed-chunk-14-1.png" width="100%" style="display: block; margin: auto;" /> ]] --- # Summarising the model fit .scroll-800[ ``` ## ## Family: gaussian ## Link function: identity ## ## Formula: ## logSalary ~ s(logCHits) + s(Errors) + s(Assists) ## ## Parametric coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 2.56978 0.01254 204.9 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Approximate significance of smooth terms: ## edf Ref.df F p-value ## s(logCHits) 4.009 4.997 132.350 <2e-16 *** ## s(Errors) 2.360 2.983 1.813 0.171 ## s(Assists) 1.100 1.192 1.113 0.268 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## R-sq.(adj) = 0.722 Deviance explained = 73% ## GCV = 0.042438 Scale est. = 0.041061 n = 261 ``` ] --- # Summarising the model fit .flex[ .w-50[ ```r gratia::appraise(hits_gam) ``` <br> - Plot observed vs fitted: should be a strong association. .monash-blue2[(Mostly good, a few outliers.)] - Histogram of residuals: should be bell-shaped. .monash-blue2[(Slightly left-skewed, with some unusually small values.)] - Normal probability plot of residuals: if residuals are a sample from normal then these values form a straight line. .monash-blue2[(Good except for some low and high observations.)] - Residuals vs fitted: roughly even vertical spread for all x values. .monash-blue2[(Not good, spread is heteroskedastic.)] ] .w-50[ <img src="images/lecture-02b/unnamed-chunk-16-1.png" width="100%" style="display: block; margin: auto;" /> ] ] --- # Summary - A GAM is a fit to functions of each predictor, and can be manually fitted using natural splines, or other functions. - Coefficients are generally not interesting, the fitted functions are. - The model can contain a mix of terms --- some linear, some nonlinear. - GAMs are additive, although low-order interactions can be included in a natural way using, e.g. bivariate smoothers or interactions of the form `ns(age,df=5):ns(year,df=5)`. --- background-size: cover class: title-slide background-image: url("images/bg-02.png") <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>. .bottom_abs.width100[ Lecturer: *Professor Di Cook* Department of Econometrics and Business Statistics <i class="fas fa-envelope"></i> ETC3250.Clayton-x@monash.edu <i class="fas fa-calendar-alt"></i> Week 2b <br> ]