class: middle center hide-slide-number monash-bg-gray80 .info-box.w-50.bg-white[ These slides are viewed best by Chrome or Firefox and occasionally need to be refreshed if elements did not load properly. See <a href=lecture-11a.pdf>here for the PDF <i class="fas fa-file-pdf"></i></a>. ] <br> .white[Press the **right arrow** to progress to the next slide!] --- class: title-slide count: false background-image: url("images/bg-02.png") # .monash-blue[ETC3250/5250: Introduction to Machine Learning] <h1 class="monash-blue" style="font-size: 30pt!important;"></h1> <br> <h2 style="font-weight:900!important;">Model-based clustering</h2> .bottom_abs.width100[ Lecturer: *Professor Di Cook* Department of Econometrics and Business Statistics <i class="fas fa-envelope"></i> ETC3250.Clayton-x@monash.edu <i class="fas fa-calendar-alt"></i> Week 11a <br> ] --- # Overview Model-based clustering makes an assumption about the distribution of the data, primarily - Assumes the data is a sample from a Gaussian mixture model - Requires the assumption that clusters have an elliptical shape - .monash-orange2[The shape is determined by the variance-covariance of the clusters] - .monash-orange2[A variety of models is available by using different constraints on the variance-covariance] Model is `$$f(x_i) = \sum_{k=1}^G\pi_kf_k(x_i; \mu_k, \Sigma_k)$$` where `\(f_k\)` is usually a multivariate normal distribution. The parameters are estimated by maximum likelihood, and choice between models is made using BIC. --- <img src="https://bradleyboehmke.github.io/HOML/20-model-clustering_files/figure-html/visualize-different-covariance-models-1.png" width="100%"> <br> <br> Source: [Boehmke (2020) Hands-on machine learning](https://bradleyboehmke.github.io/HOML/model-clustering.html) --- # Variance-covariance specification Constraints applied on cluster variance-covariance: 1. .monash-blue2[volume]: each cluster has approximately the same size 2. .monash-blue2[shape]: each cluster has approximately the same variance so that the distribution is spherical 3. .monash-blue2[orientation]: each cluster is forced to be axis-aligned --- # Variance-covariance contraints |Model|Family|Volume|Shape|Orientation|Identifier| |---|---|---|---|---|---| |1|Spherical|Equal|Equal|NA|EII| |2|Spherical|Variable|Equal|NA|VII| |3|Diagonal|Equal|Equal|Axes|EEI| |6|Diagonal|Variable|Variable|Axes|VVI| |7|General|Equal|Equal|Equal|EEE| |8|General|Equal|Variable|Equal|EVE| |10|General|Variable|Variable|Equal|VVE| |11|General|Equal|Equal|Variable|EEV| |12|General|Variable|Equal|Variable|VEV| |14|General|Variable|Variable|Variable|VVV| --- class: split-33 .column[.pad50px[ # Example: nuisance variable <img src="images/lecture-11a/unnamed-chunk-3-1.png" width="80%" style="display: block; margin: auto;" /> ]] .column[.content.vmiddle[ ```r df_mc <- Mclust(df, G = 2) summary(df_mc) ``` ``` ## ---------------------------------------------------- ## Gaussian finite mixture model fitted by EM algorithm ## ---------------------------------------------------- ## ## Mclust EEI (diagonal, equal volume and shape) model with 2 components: ## ## log-likelihood n df BIC ICL ## -204.1509 100 7 -440.538 -440.538 ## ## Clustering table: ## 1 2 ## 50 50 ``` ]] --- class: split-two .column[.pad50px[ ```r plot(df_mc, what = "density") ``` <img src="images/lecture-11a/unnamed-chunk-5-1.png" width="100%" style="display: block; margin: auto;" /> ]] .column[.pad50px[ ```r plot(df_mc, what = "uncertainty") ``` <img src="images/lecture-11a/unnamed-chunk-6-1.png" width="100%" style="display: block; margin: auto;" /> ]] --- class: split-two # Parameter estimates .column[.pad50px[ <br> <br> Cluster means ```r options(digits=2) df_mc$parameters$mean ``` ``` ## [,1] [,2] ## x1 -0.97 0.97 ## x2 0.11 -0.11 ``` ]] .column[.pad50px[ Cluster variances ```r df_mc$parameters$variance$sigma ``` ``` ## , , 1 ## ## x1 x2 ## x1 0.052 0.00 ## x2 0.000 0.98 ## ## , , 2 ## ## x1 x2 ## x1 0.052 0.00 ## x2 0.000 0.98 ``` ]] --- class: split-33 .column[.pad50px[ # Example: nuisance observations <img src="images/lecture-11a/unnamed-chunk-9-1.png" width="80%" style="display: block; margin: auto;" /> ]] .column[.content.vmiddle[ ```r df_mc <- Mclust(df, G = 2) summary(df_mc) ``` ``` ## ---------------------------------------------------- ## Gaussian finite mixture model fitted by EM algorithm ## ---------------------------------------------------- ## ## Mclust EEE (ellipsoidal, equal volume, shape and orientation) model with 2 ## components: ## ## log-likelihood n df BIC ICL ## -205 120 8 -447 -452 ## ## Clustering table: ## 1 2 ## 61 59 ``` ]] --- class: split-two .column[.pad50px[ ```r plot(df_mc, what = "density") ``` <img src="images/lecture-11a/unnamed-chunk-11-1.png" width="100%" style="display: block; margin: auto;" /> ]] .column[.pad50px[ ```r plot(df_mc, what = "uncertainty") ``` <img src="images/lecture-11a/unnamed-chunk-12-1.png" width="100%" style="display: block; margin: auto;" /> ]] --- class: split-two # Parameter estimates .column[.pad50px[ <br> <br> Cluster means ```r df_mc$parameters$mean ``` ``` ## [,1] [,2] ## x1 -0.88 0.92 ## x2 -0.88 0.92 ``` ]] .column[.pad50px[ Cluster variances ```r df_mc$parameters$variance$sigma ``` ``` ## , , 1 ## ## x1 x2 ## x1 0.186 0.081 ## x2 0.081 0.185 ## ## , , 2 ## ## x1 x2 ## x1 0.186 0.081 ## x2 0.081 0.185 ``` ]] --- class: split-66 .column[.pad50px[ ```r set.seed(6) data(flea) flea_mc <- Mclust(flea[,2:7]) summary(flea_mc) ``` ``` ## ---------------------------------------------------- ## Gaussian finite mixture model fitted by EM algorithm ## ---------------------------------------------------- ## ## Mclust EEE (ellipsoidal, equal volume, shape and orientation) model with 3 ## components: ## ## log-likelihood n df BIC ICL ## -1305 74 41 -2786 -2786 ## ## Clustering table: ## 1 2 3 ## 21 31 22 ``` ]] .column[.content.vmiddle[ # Example: flea with nuisance variables and observations Let model-based decide number of clusters, and variance-covariance parametrization. <img src="images/lecture-11a/unnamed-chunk-16-1.png" width="80%" style="display: block; margin: auto;" /> ]] --- class: split-two .column[.pad50px[ <img src="images/lecture-11a/unnamed-chunk-17-1.png" width="100%" style="display: block; margin: auto;" /> - The spherical models are clearly inferior. - Across many models, three clusters is where ther is a peak in BIC - A few parametrisations are nearly equally as good, pick the simplest. ]] .column[.pad50px[ <img src="images/lecture-11a/unnamed-chunk-18-1.png" width="100%" style="display: block; margin: auto;" /> ]] --- class: split-two # Parameter estimates .column[.pad50px[ <br> Cluster means ```r flea_mc$parameters$mean ``` ``` ## [,1] [,2] [,3] ## tars1 183 201 138 ## tars2 130 119 125 ## head 51 49 52 ## aede1 146 125 138 ## aede2 14 14 10 ## aede3 105 81 107 ``` ]] .column[.pad50px[ Cluster variances .scroll-800[ ```r flea_mc$parameters$variance$sigma[,,1] ``` ``` ## tars1 tars2 head aede1 aede2 aede3 ## tars1 154.70 56.36 19.99 21.83 0.153 20.125 ## tars2 56.36 52.48 10.78 9.44 -0.464 11.510 ## head 19.99 10.78 5.87 6.22 -0.223 4.609 ## aede1 21.83 9.44 6.22 22.09 -0.537 11.220 ## aede2 0.15 -0.46 -0.22 -0.54 0.973 0.056 ## aede3 20.13 11.51 4.61 11.22 0.056 52.380 ``` ] ]] --- # Summary - Model-based clustering provides a nice automated clustering, if the data has neatly separated clusters, even in the presence of nuisance variables. - Non-elliptical clusters could be modeled by combining multiple ellipses. - It is affected by nuisance observations, and has a parameter `noise` to attempt to filter these. - It may not function so well if the data hasn't got separated clusters. - k-means and Wards linkage hierarchical would yield similar results to constraining the variance-covariance model to EEI (or VII, EEE). - Having a functional model for the clusters is useful. --- background-size: cover class: title-slide background-image: url("images/bg-02.png") <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>. .bottom_abs.width100[ Lecturer: *Professor Di Cook* Department of Econometrics and Business Statistics <i class="fas fa-envelope"></i> ETC3250.Clayton-x@monash.edu <i class="fas fa-calendar-alt"></i> Week 11a <br> ]