Course

Machine Learning 2

Teacher(s)

Patrick Groenen, Pieter Schoonees
Research field

Data Science and Econometrics
Dates

Period 3 - Jan 08, 2024 to Mar 01, 2024

Course type

Core
Program year

First
Credits

4

Course description

Machine Learning 2 discusses supervised and unsupervised machine learning approaches which have become popular tools for solving practical problems.

It has as its goal that the student obtains a thorough technical understanding of a selection of supervised and unsupervised machine learning techniques, can implement the technique in the high level language R, and can write a report about an application of the technique. This course is a follow-up to Machine Learning 1.

The first part of the course continues the discussion of supervised machine learning techniques from the Machine Learning 1 course. The second part focuses on unsupervised machine learning techniques for finding meaningful relations between all variables in a data set simultaneously. In contrast to supervised machine learning, in unsupervised techniques all variables play similar roles. Therefore, the relationships among all variables must be modelled, whereas in supervised learning only the relationships between the target variable and the features are of direct interest. An important application of unsupervised learning techniques in management is customer segmentation in targeted marketing.

An overview of techniques and ideas to be treated are:

- gradient boosting machines,

- support vector machines,

- principal components analysis (PCA) and variants,

- multidimensional scaling (MDS),

- cluster analysis.

This course is a field course in the Tinbergen Institute program for 3 credits in the major econometrics.

Prerequisites

Machine Learning 1 (Supervised Machine Learning)

Course literature

The course will cover material from the following list of readings, which are considered essential for your learning experience. These books and articles are also part of the examined material. Changes in the reading list will be communicated on Canvas.
Books:

Hastie, T., Tibshirani, R. and J. Friedman. (2009). The elements of statistical learning (2nd edition). Springer. Available at https://web.stanford.edu/~hastie/Papers/ESLII.pdf.
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning. Springer. Available at https://www.statlearning.com/.

Selected papers, including:

Groenen, P. J. F., & van de Velden, M. (2016). Multidimensional scaling by majorization: A review. Journal of Statistical Software, 73(8), 1-26.
Groenen, PJF, Nalbantov, G. and Bioch, J.C. 2009. SVM-Maj: a majorization approach to linear support vector machines with different hinge errors. Advances in Data Analysis and classification, 2(1), 17-43.
Witten, D. M., Tibshirani, R., & Hastie, T. (2009). A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics, 10(3), 515-534.66