0% found this document useful (0 votes)
19 views59 pages

Introduction to Machine Learning Concepts

The document provides an introduction to machine learning (ML) and its relationship with artificial intelligence (AI), emphasizing the goal of creating systems that can learn from data without explicit programming. It outlines various learning paradigms such as supervised, unsupervised, semi-supervised, and reinforcement learning, along with examples and applications for each. Additionally, it discusses the evolution of AI and ML, highlighting key historical milestones and advancements in algorithms and computational power.

Uploaded by

sstudyfun
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views59 pages

Introduction to Machine Learning Concepts

The document provides an introduction to machine learning (ML) and its relationship with artificial intelligence (AI), emphasizing the goal of creating systems that can learn from data without explicit programming. It outlines various learning paradigms such as supervised, unsupervised, semi-supervised, and reinforcement learning, along with examples and applications for each. Additionally, it discusses the evolution of AI and ML, highlighting key historical milestones and advancements in algorithms and computational power.

Uploaded by

sstudyfun
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Introduction to Machine Learning.

Basic Concepts and Learning


Paradigms.

1
Machine Learning
Artificial Intelligence
Natural Language
Kernel Methods Processing

Deep Learning Nearest Neighbors Knowledge


Representation
Markov Models Machine K-means
Learning Expert Systems
Bayesian Models Random Forest

Hierarchical Clustering Computer Vision

Image Processing
Signal Processing

2
What is artificial intelligence (AI)?
• The ultimate goal of artificial intelligence is to build systems able to reach
human intelligence levels
• Turing test a computer is said to possess human-level intelligence if a remote
human interrogator, within a fixed time frame, cannot distinguish between
the computer and a human subject based on their replies to various questions
posed by the interrogator

3
Perhaps we are going in the right direction?

4
What is machine learning (ML)?

• Many AI researchers consider the ultimate goal of AI can be


achieved by imitating the way humans learn
• Machine Learning – is the scientific study of algorithms and
statistical models that computer systems use to learn from
observations, without being explicitly programmed
• In this context, learning refers to:
➢ recognizing complex patterns in data
➢ making intelligent decisions based on data observations

5
Classic Programming vs Machine Learning
Classic Programming

Data
Computer Output
Program

Machine Learning

Data
Computer Program
Output
6
A well-posed machine learning problem

• What problems can be solved* with machine learning?


• Well-posed machine learning problem:
"A computer program is said to learn from experience E with respect to
some class of tasks T and performance measure P, if its performance at
tasks in T, as measured by P, improves with experience E.” – Tom Mitchell
(*) implies a certain degree of accuracy

7
A well-posed machine learning problem
• Arthur Samuel (1959) wrote a program for playing checkers (perhaps the
first program based on the concept of learning, as defined by Tom Mitchell)
• The program played 10K games against itself
• The program was designed to find the good and bad positions on the board
from the current state, based on the probability of winning or losing

• In this example:
➢ E = 10000 games
➢ T = play checkers
➢ P = win or lose

8
Strong AI versus Weak AI

• Strong / generic / true AI


(see the Turing test and its extensions)

• Weak / narrow AI
(focuses on a specific well-posed problem)

9
When do we use machine learning?

• We use ML when it is hard (impossible) to define a set of


rules by hand / to write a program based on explicit rules

• Examples of tasks that be solved through machine learning:


➢ face detection
➢ speech recognition
➢ stock price prediction
➢ object recognition

10
The essence of machine learning

• A pattern exists
• We cannot express it programmatically
• We have data on it

11
What is machine learning?

[Arthur Samuel, 1959] field of study that:


• gives computers the ability to learn without being
explicitly programmed

[Kevin Murphy] algorithms that:


• automatically detect patterns in data
• use the uncovered patterns to predict future data or
other outcomes of interest

[Tom Mitchell] algorithms that:


• improve their performance (P)
• at some task (T)
• with experience (E)

12
Brief history of AI

(C) Dhruv Batra

13
Brief history of AI

• “We propose that a 2 month, 10 man study of artificial intelligence


be carried out during the summer of 1956 at Dartmouth College in
Hanover, New Hampshire.”
• The study is to proceed on the basis of the conjecture that every
aspect of learning or any other feature of intelligence can in
principle be so precisely described that a machine can be made to
simulate it.
• An attempt will be made to find how to make machines use
language, form abstractions and concepts, solve kinds of problems
now reserved for humans, and improve themselves.
• We think that a significant advance can be made in one or more of
these problems if a carefully selected group of scientists work on it
together for a summer.”

14
Brief history of AI

• 1960-1980s: ”AI Winter”


• 1990s: Neural networks dominate, essentially because
of the discovery of the backpropagation for training
neural networks with two or more layers
• 2000s: Kernel methods dominate, essentially because of
the instability of training neural networks
• 2010s: The comeback of neural networks, essentially
because of the discovery of deep learning

15
Why are things working today?

• More compute
power

Accuracy
• More data

• Better algorithms
/ models

Amount of Training Data

16
ML in a nutshell

• Tens of thousands of machine learning algorithms


➢ Researchers publish hundreds new every year

• Decades of ML research oversimplified:


➢ Learn a mapping f from the input X to the output Y, i.e.: 𝑓: 𝑋 → 𝑌

➢ Example: X are emails, Y: {spam, not-spam}

17
ML in a nutshell
Input: X (images, texts, emails…)

Output: Y (spam or not-spam…)

(Unknown) Target Function:


𝑓: 𝑋 → 𝑌 (the “true” mapping / reality)

Data
𝑥1 , 𝑦1 , (𝑥2 , 𝑦2 ),… (𝑥𝑁 , 𝑦𝑁 )

Model / Hypothesis Class


𝑔: 𝑋 → 𝑌
𝑦 = 𝑔 𝑥 = 𝑠𝑖𝑔𝑛(𝑤 𝑇 𝑥)
18
ML in a nutshell

• Every machine learning algorithm has three components:


➢ Representation / Model Class
➢ Evaluation / Objective Function
➢ Optimization

19
Where does ML fit in?

Biology Applied
Neuroscience Maths

• Biology of learning • Optimization


• Inspiring paradigms • Linear algebra
• E.g.: neural networks • Derivatives
Machine • E.g.: local minimum
Learning

Computer
Statistics
Science

• Algorithms • Estimation techniques


• Data structures • Theoretical frameworks
• Complexity analysis • Optimality, efficiency
• E.g.: k-d trees • E.g.: Bayes rule
20
Learning paradigms
• Standard learning paradigms:
➢ Supervised learning
➢ Unsupervised learning
➢ Semi-supervised learning
➢ Reinforcement learning

• Non-standard paradigms:
➢ Active learning
➢ Transfer learning
➢ Transductive learning
21
Supervised learning
• We have a set of labeled training samples
• Example 1: object recognition in images annotated with
corresponding class labels

Car Person

Person Dog
Car
22
Supervised learning
• Example 2: handwritten digit recognition (on the MNIST data set)

• Images of 28 x 28 pixels
• We can represent each image as a vector x of 784 components
• We train a classifier 𝑓(𝑥) such that:
𝑓 ∶ 𝑥 → {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
23
Supervised learning
• Example 2 (continued): handwritten digit recognition (on the MNIST data set)

• Starting with a training set of about 60K images (about 6000 images per class)
• … the error rate can go down to 0.23% (using convolutional neural networks)
• Among the first (learning-based) systems used in a large-scale commercial setting
for postal code and bank cheque processing

24
Supervised learning
• Example 3: face detection

• One approach consists of sliding a window over the image


• The goal is to classify each window into one of the two
possible classes: face or not-face
• The original problem is transformed into a classification
problem
25
Supervised learning
• Example 3: face detection

• We start with a set of face images with different variations


such as age, gender, illumination, pose, but no translations
• … and a larger set of images that do not contain full faces

26
Supervised learning
• Example 4: spam detection

• The task is to classify an email into spam or not-spam


• The occurrence of the word “Dollars” is a good indicator of spam
• A possible representation is a vector of word frequencies
27
We count the words…
obtaining X

28
The spam detection algorithm

Confidence /
performance
guarantee?

Why linear
combination?
Why these words?
Where do the weights come from?
29
Supervised learning
• Example 5: predicting stock prices on the market

• The goal is to predict the price at a future date, for example


in a few days
• This is a regression task, since the output is continuous

30
Supervised learning
• Example 6: image difficulty prediction [Ionescu et al. CVPR2016]

• The goal is to predict the time necessary for a human to solve a visual
search task
• This is a regression task, since the output is continuous

31
Canonical forms of supervised learning problems

• Classification?

• Regression?

32
Age estimation in images

• Classification?

• Regression?

What age?

33
The supervised learning paradigm

34
Supervised learning models

• Naive Bayes
• k-Nearest Neighbors
• Decision trees and random forests
• Support Vector Machines
• Kernel methods
• Kernel Ridge Regression
• Neural networks
• Many others…

35
Unsupervised Learning

• We have an unlabeled training set of samples


• Example 1: clustering images based on similarity

36
Unsupervised Learning
• Example 1: clustering MNIST images based on
similarity [Georgescu et al. ICIP2019]

37
Unsupervised Learning

• Example 2: unsupervised features learning

38
Unsupervised Learning
• Example 2: unsupervised features learning for abnormal
event detection [Ionescu et al. CVPR2019]

39
Unsupervised Learning
• Example 3: clustering mammals by family, species, etc.

• The task is to generate the phylogenetic tree based on DNA

40
Canonical forms of unsupervised learning problems
• Clustering

• Dimensionality Reduction

41
Unsupervised learning models

• K-means clustering
• DBScan
• Hierarchical clustering
• Principal Component Analysis
• t-Distributed Stochastic Neigbor Embedding
• Hidden Markov Models
• Many others…

42
Semi-supervised learning
• We have a training set of samples that are partially
annotated with class labels
• Example 1: object recognition in images, some of
which are annotated with corresponding class
labels

Car Dog
Person 43
Reinforcement learning

• How does it work?


• The system learns intelligent behavior using a
reinforcement signal (reward)
• The reward is given after several actions are taken
(it does come after every action)
• Time matters (data is sequential, not i.i.d.)
• The actions of the system can influence the data

44
Reinforcement learning

• Example 1: learning to play Go


• +/- reward for winning / losing the game

45
Reinforcement learning

• Example 2: teaching a robot to ride a bike


• +/- reward for moving forward / falling

46
Reinforcement learning
• Example 3: learning to play Pong from image pixels
• +/- reward for increasing
• personal / adversary score

47
Reinforcement learning paradigm

48
Formalizing as Markov Decision Process

49
Formalizing as Markov Decision Process

50
Formalizing as Markov Decision Process

• Solution based on dynamic programming (small graphs)


or approximation (large graphs)
• Goal: select the actions that maximize the total final
reward
• The actions can have long-term consequences
• Sacrificing the immediate reward can lead to higher
rewards on the long term

51
Formalizing as Markov Decision Process

• AlphaGo example:
➢ Narrator 1: “That’s a very strange move”
➢ Narrator 2: “I thought it was a mistake”
➢ But actually, “the move turned the course of the game.
AlphaGo went on to win Game Two, and at the post-game
press conference, Lee Sedol was in shock.”
➢ [Link]
sedol-redefined-future/

52
Active learning
• Given a large set of unlabeled samples, we have to choose a small
subset for annotation in order to obtain a good classification model

53
Transfer learning
• Starting with a model trained for a certain task/domain,
use the model for a different task/domain

More specific object classes,


face recognition,
texture classification, etc.

54
Transfer learning
• Adapt the model to specific test samples
• Example 1: facial expression recognition [Georgescu et al. Access2019]

55
Transfer learning
• Example 2: zero-shot learning

At test time, some distinguishing


properties of objects (auxiliary
information) is provided.

For example, a model which has been


trained to recognize horses, but has
never been given a zebra, can still
recognize a zebra when it also knows
that zebras look like striped horses.

56
Bibliography

57
58
Thank You!
Slides Acknowledgement: Radu Ionescu, Prof. PhD.,
Faculty of Mathematics and Computer Science, University of Bucharest

59

You might also like