Machine Learning Course SS 13 U Stuttgart
See my general teaching page for previous versions of this lecture.
Exploiting large-scale data is a central challenge of our
time. Machine Learning is the core discipline to address this
challenge, aiming to extract useful models and structure from
data. Studying Machine Learning is motivated in multiple ways: 1) as
the basis of commercial data mining (Google, Amazon, Picasa, etc), 2)
a core methodological tool for data analysis in all sciences (vision,
linguistics, software engineering, but also biology, physics,
neuroscience, etc) and finally, 3) as a core foundation of autonomous
intelligent systems.
This lecture introduces to modern methods in Machine Learning,
including discriminative as well as probabilistic generative models. A
preliminary outline of topics is:
- motivation
- probabilistic modeling and inference
- regression and classification methods (kernel methods, Gaussian Processes, Bayesian kernel logistic regression, relations)
- discriminative learning (logistic regression, Conditional Random Fields)
- feature selection
- boosting and ensemble learning
- representation learning and embedding (kernel PCA and derivatives, deep learning)
- graphical models
- inference in graphical models (MCMC, message passing, variational)
- learning in graphical models
Students should bring basic knowledge of linear algebra, probability theory and
optimization.
- Organization
-
-
- This is the central website of the lecture. Link to slides, exercise sheets, announcements, etc will all be posted here.
- See the 01-introduction
slides for further information.
- Schedule, slides & exercises
-
date |
topics |
slides |
exercises (due on 'date'+1) |
08.04. |
Introduction & Organization |
01-introduction
|
(notation
) |
15.04. |
Regression
linear regression, non-linear features
(polynomial, RBFs, piece-wise), regularization, cross validation,
Ridge/Lasso, kernel trick |
02-regression
|
e01-intro
|
22.04. |
Classification
classification, discriminative function,
logistic regression, binary \& multi-class case, conditional random fields |
03-classification
|
e02-linearRegression
../data/dataLinReg1D.txt
../data/dataLinReg2D.txt
../data/dataQuadReg1D.txt
../data/dataQuadReg2D.txt
../data/dataQuadReg2D_noisy.txt
|
29.04. |
Classification (cont.) |
|
e03-classification
../data/data2Class.txt
../data/digit_pca.txt
../data/digit_label.txt
|
13.05. |
Breadth of ML ideas |
04-ideas
|
e04-PCA-PLS
../data/yalefaces_cropBackground.tgz
../data/yalefaces.tgz
|
27.05. |
Breadth of ML ideas (cont.) |
|
e05-WEKA-boosting
../data/digits-train.arff.gz
../data/digits-test.arff.gz
|
03.06. |
SVMs (by Vien Ngo) |
05-vien-SVM
|
e06-SVM-NN
|
10.06. |
Deep Learning
Probability basics |
04-ideas
06-BayesBasics
|
e07-BayesBasics
|
17.06. |
Bayesian Regression & Classification |
07-BayesianRegressionClassification
|
e08-GaussianProcesses
|
24.06. |
Graphical Models |
08-graphicalModels
|
e09-graphicalModels
|
01.07. |
Inference in Graphical Models |
09-graphicalModels-Inference
|
e10-inference
|
08.07. |
Learning with Graphical Models |
10-graphicalModels-Learning
|
e11-EM
../data/gauss.txt
../data/mixture.txt
|
15.07. |
Summary |
13-MachineLearning-script
|
|
- Literature
-
[1] The Elements of Statistical Learning: Data Mining, Inference, and Prediction
by Trevor Hastie, Robert Tibshirani and Jerome Friedman. Springer, Second Edition, 2009.
full online version available
(recommended: read introductory chapter)
[2] Pattern Recognition and Machine Learning
by Bishop, C. M.. Springer 2006.
online
(especially chapter 8, which is fully online)
[email by Stefan Otte:] This is a nice little (26 pages) linear
algebra and matrix calculus reference. It's used for the ML class in
Stanford. Maybe it's interesting for your ML class.
link
[email by Stefan Otte:]
Feature selection, l1 vs. l2 regularization, and rotational invariance
Paper:
link
Comments:
link
[email by Stefan Otte:]
ich habe vor kurzem einen sehr guten Google Tech Talk zum Thema
Ensembles gesehen. In dem Talk "The Counter-Intuitive Properties of
Ensembles for Machine Learning, or, Democracy Defeats Meritocracy"
argument W. Philip Kegelmeyer (vereinfacht gesagt), dass man fuer
Supervised Learning Ensembles benutzen soll. Vll. ist das fuer den ein
oder anderen Studenten von Interesse.
Hier ein paar meiner Notizen:
- Boosting: overfitting, sensitive to outliers.
- "Ensembles of experts": diversity of experts --> diversity in error
--> robustness/no overfitting
- "Out of Bag validation" (OOB) to determine ensemble size (vs.
learning the weights for the voting (which does not scale))
- unstable classifiers (e.g. decision trees) are a good fit for ensembles
- decision trees without pruning work well with ensembles. (pruning is
normally expensive!)
- "Ensembles of bozos": LOTS of bozos which train on tiny subsets (1%)
of the data
- traditional < experts < bozos
- training bozos is faster than training one traditional sage!
http://csmr.ca.sandia.gov/~wpk/
http://csmr.ca.sandia.gov/~wpk/slides/avatar-ensembles.pdf
http://csmr.ca.sandia.gov/~wpk/avi/avatar-tools-background-video.avi
Beste Grüße,
Stefan
Recent Posts
Die gängigen Erklärungen zu “Was ist Informatik?” – etwa von der
Gesellschaft für Infomatik,
der
TU Dresden,
oder auf Wikipedia –
machen es einem schwer, sic...