Introduction to Machine Learning.
Basic Concepts and Learning
Paradigms.
1
Machine Learning
Artificial Intelligence
Natural Language
Kernel Methods Processing
Deep Learning Nearest Neighbors Knowledge
Representation
Markov Models Machine K-means
Learning Expert Systems
Bayesian Models Random Forest
Hierarchical Clustering Computer Vision
Image Processing
Signal Processing
2
What is artificial intelligence (AI)?
• The ultimate goal of artificial intelligence is to build systems able to reach
human intelligence levels
• Turing test a computer is said to possess human-level intelligence if a remote
human interrogator, within a fixed time frame, cannot distinguish between
the computer and a human subject based on their replies to various questions
posed by the interrogator
3
Perhaps we are going in the right direction?
4
What is machine learning (ML)?
• Many AI researchers consider the ultimate goal of AI can be
achieved by imitating the way humans learn
• Machine Learning – is the scientific study of algorithms and
statistical models that computer systems use to learn from
observations, without being explicitly programmed
• In this context, learning refers to:
➢ recognizing complex patterns in data
➢ making intelligent decisions based on data observations
5
Classic Programming vs Machine Learning
Classic Programming
Data
Computer Output
Program
Machine Learning
Data
Computer Program
Output
6
A well-posed machine learning problem
• What problems can be solved* with machine learning?
• Well-posed machine learning problem:
"A computer program is said to learn from experience E with respect to
some class of tasks T and performance measure P, if its performance at
tasks in T, as measured by P, improves with experience E.” – Tom Mitchell
(*) implies a certain degree of accuracy
7
A well-posed machine learning problem
• Arthur Samuel (1959) wrote a program for playing checkers (perhaps the
first program based on the concept of learning, as defined by Tom Mitchell)
• The program played 10K games against itself
• The program was designed to find the good and bad positions on the board
from the current state, based on the probability of winning or losing
• In this example:
➢ E = 10000 games
➢ T = play checkers
➢ P = win or lose
8
Strong AI versus Weak AI
• Strong / generic / true AI
(see the Turing test and its extensions)
• Weak / narrow AI
(focuses on a specific well-posed problem)
9
When do we use machine learning?
• We use ML when it is hard (impossible) to define a set of
rules by hand / to write a program based on explicit rules
• Examples of tasks that be solved through machine learning:
➢ face detection
➢ speech recognition
➢ stock price prediction
➢ object recognition
10
The essence of machine learning
• A pattern exists
• We cannot express it programmatically
• We have data on it
11
What is machine learning?
[Arthur Samuel, 1959] field of study that:
• gives computers the ability to learn without being
explicitly programmed
[Kevin Murphy] algorithms that:
• automatically detect patterns in data
• use the uncovered patterns to predict future data or
other outcomes of interest
[Tom Mitchell] algorithms that:
• improve their performance (P)
• at some task (T)
• with experience (E)
12
Brief history of AI
(C) Dhruv Batra
13
Brief history of AI
• “We propose that a 2 month, 10 man study of artificial intelligence
be carried out during the summer of 1956 at Dartmouth College in
Hanover, New Hampshire.”
• The study is to proceed on the basis of the conjecture that every
aspect of learning or any other feature of intelligence can in
principle be so precisely described that a machine can be made to
simulate it.
• An attempt will be made to find how to make machines use
language, form abstractions and concepts, solve kinds of problems
now reserved for humans, and improve themselves.
• We think that a significant advance can be made in one or more of
these problems if a carefully selected group of scientists work on it
together for a summer.”
14
Brief history of AI
• 1960-1980s: ”AI Winter”
• 1990s: Neural networks dominate, essentially because
of the discovery of the backpropagation for training
neural networks with two or more layers
• 2000s: Kernel methods dominate, essentially because of
the instability of training neural networks
• 2010s: The comeback of neural networks, essentially
because of the discovery of deep learning
15
Why are things working today?
• More compute
power
Accuracy
• More data
• Better algorithms
/ models
Amount of Training Data
16
ML in a nutshell
• Tens of thousands of machine learning algorithms
➢ Researchers publish hundreds new every year
• Decades of ML research oversimplified:
➢ Learn a mapping f from the input X to the output Y, i.e.: 𝑓: 𝑋 → 𝑌
➢ Example: X are emails, Y: {spam, not-spam}
17
ML in a nutshell
Input: X (images, texts, emails…)
Output: Y (spam or not-spam…)
(Unknown) Target Function:
𝑓: 𝑋 → 𝑌 (the “true” mapping / reality)
Data
𝑥1 , 𝑦1 , (𝑥2 , 𝑦2 ),… (𝑥𝑁 , 𝑦𝑁 )
Model / Hypothesis Class
𝑔: 𝑋 → 𝑌
𝑦 = 𝑔 𝑥 = 𝑠𝑖𝑔𝑛(𝑤 𝑇 𝑥)
18
ML in a nutshell
• Every machine learning algorithm has three components:
➢ Representation / Model Class
➢ Evaluation / Objective Function
➢ Optimization
19
Where does ML fit in?
Biology Applied
Neuroscience Maths
• Biology of learning • Optimization
• Inspiring paradigms • Linear algebra
• E.g.: neural networks • Derivatives
Machine • E.g.: local minimum
Learning
Computer
Statistics
Science
• Algorithms • Estimation techniques
• Data structures • Theoretical frameworks
• Complexity analysis • Optimality, efficiency
• E.g.: k-d trees • E.g.: Bayes rule
20
Learning paradigms
• Standard learning paradigms:
➢ Supervised learning
➢ Unsupervised learning
➢ Semi-supervised learning
➢ Reinforcement learning
• Non-standard paradigms:
➢ Active learning
➢ Transfer learning
➢ Transductive learning
21
Supervised learning
• We have a set of labeled training samples
• Example 1: object recognition in images annotated with
corresponding class labels
Car Person
Person Dog
Car
22
Supervised learning
• Example 2: handwritten digit recognition (on the MNIST data set)
• Images of 28 x 28 pixels
• We can represent each image as a vector x of 784 components
• We train a classifier 𝑓(𝑥) such that:
𝑓 ∶ 𝑥 → {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
23
Supervised learning
• Example 2 (continued): handwritten digit recognition (on the MNIST data set)
• Starting with a training set of about 60K images (about 6000 images per class)
• … the error rate can go down to 0.23% (using convolutional neural networks)
• Among the first (learning-based) systems used in a large-scale commercial setting
for postal code and bank cheque processing
24
Supervised learning
• Example 3: face detection
• One approach consists of sliding a window over the image
• The goal is to classify each window into one of the two
possible classes: face or not-face
• The original problem is transformed into a classification
problem
25
Supervised learning
• Example 3: face detection
• We start with a set of face images with different variations
such as age, gender, illumination, pose, but no translations
• … and a larger set of images that do not contain full faces
26
Supervised learning
• Example 4: spam detection
• The task is to classify an email into spam or not-spam
• The occurrence of the word “Dollars” is a good indicator of spam
• A possible representation is a vector of word frequencies
27
We count the words…
obtaining X
28
The spam detection algorithm
Confidence /
performance
guarantee?
Why linear
combination?
Why these words?
Where do the weights come from?
29
Supervised learning
• Example 5: predicting stock prices on the market
• The goal is to predict the price at a future date, for example
in a few days
• This is a regression task, since the output is continuous
30
Supervised learning
• Example 6: image difficulty prediction [Ionescu et al. CVPR2016]
• The goal is to predict the time necessary for a human to solve a visual
search task
• This is a regression task, since the output is continuous
31
Canonical forms of supervised learning problems
• Classification?
• Regression?
32
Age estimation in images
• Classification?
• Regression?
What age?
33
The supervised learning paradigm
34
Supervised learning models
• Naive Bayes
• k-Nearest Neighbors
• Decision trees and random forests
• Support Vector Machines
• Kernel methods
• Kernel Ridge Regression
• Neural networks
• Many others…
35
Unsupervised Learning
• We have an unlabeled training set of samples
• Example 1: clustering images based on similarity
36
Unsupervised Learning
• Example 1: clustering MNIST images based on
similarity [Georgescu et al. ICIP2019]
37
Unsupervised Learning
• Example 2: unsupervised features learning
38
Unsupervised Learning
• Example 2: unsupervised features learning for abnormal
event detection [Ionescu et al. CVPR2019]
39
Unsupervised Learning
• Example 3: clustering mammals by family, species, etc.
• The task is to generate the phylogenetic tree based on DNA
40
Canonical forms of unsupervised learning problems
• Clustering
• Dimensionality Reduction
41
Unsupervised learning models
• K-means clustering
• DBScan
• Hierarchical clustering
• Principal Component Analysis
• t-Distributed Stochastic Neigbor Embedding
• Hidden Markov Models
• Many others…
42
Semi-supervised learning
• We have a training set of samples that are partially
annotated with class labels
• Example 1: object recognition in images, some of
which are annotated with corresponding class
labels
Car Dog
Person 43
Reinforcement learning
• How does it work?
• The system learns intelligent behavior using a
reinforcement signal (reward)
• The reward is given after several actions are taken
(it does come after every action)
• Time matters (data is sequential, not i.i.d.)
• The actions of the system can influence the data
44
Reinforcement learning
• Example 1: learning to play Go
• +/- reward for winning / losing the game
45
Reinforcement learning
• Example 2: teaching a robot to ride a bike
• +/- reward for moving forward / falling
46
Reinforcement learning
• Example 3: learning to play Pong from image pixels
• +/- reward for increasing
• personal / adversary score
47
Reinforcement learning paradigm
48
Formalizing as Markov Decision Process
49
Formalizing as Markov Decision Process
50
Formalizing as Markov Decision Process
• Solution based on dynamic programming (small graphs)
or approximation (large graphs)
• Goal: select the actions that maximize the total final
reward
• The actions can have long-term consequences
• Sacrificing the immediate reward can lead to higher
rewards on the long term
51
Formalizing as Markov Decision Process
• AlphaGo example:
➢ Narrator 1: “That’s a very strange move”
➢ Narrator 2: “I thought it was a mistake”
➢ But actually, “the move turned the course of the game.
AlphaGo went on to win Game Two, and at the post-game
press conference, Lee Sedol was in shock.”
➢ [Link]
sedol-redefined-future/
52
Active learning
• Given a large set of unlabeled samples, we have to choose a small
subset for annotation in order to obtain a good classification model
53
Transfer learning
• Starting with a model trained for a certain task/domain,
use the model for a different task/domain
More specific object classes,
face recognition,
texture classification, etc.
54
Transfer learning
• Adapt the model to specific test samples
• Example 1: facial expression recognition [Georgescu et al. Access2019]
55
Transfer learning
• Example 2: zero-shot learning
At test time, some distinguishing
properties of objects (auxiliary
information) is provided.
For example, a model which has been
trained to recognize horses, but has
never been given a zebra, can still
recognize a zebra when it also knows
that zebras look like striped horses.
56
Bibliography
57
58
Thank You!
Slides Acknowledgement: Radu Ionescu, Prof. PhD.,
Faculty of Mathematics and Computer Science, University of Bucharest
59