0% found this document useful (0 votes)
46 views165 pages

Deep Learning in Computer Vision

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views165 pages

Deep Learning in Computer Vision

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Computer vision, Feature

extraction and Deep Learning


by
Ankit Jha

Resources courtesy: Google, Medium Blogs, Courses from IIT Bombay, MIT, etc.
Computer Vision
Make computers understand images and
video.

What kind of scene?

Where are the cars?

How far is the


building?


Vision is really hard

• Vision is an amazing feat of natural


intelligence
– Visual cortex occupies about 50% of Macaque brain
– More human brain devoted to vision than anything else

Is that a
queen or a
bishop?
Why computer vision matters

Safety Health Security

Comfort Fun Access


Ridiculously brief history of computer vision
• 1966: Minsky assigns computer vision
as an undergrad summer project
• 1960’s: interpretation of synthetic
worlds
Guzman ‘68
• 1970’s: some progress on interpreting
selected images
• 1980’s: ANNs come and go; shift toward
geometry and increased mathematical
rigor
• 1990’s: face recognition; statistical Ohta Kanade ‘78
analysis in vogue
• 2000’s: broader recognition; large
annotated datasets available; video
processing starts

Turk and Pentland ‘91


How vision is used now
• Examples of state-of-the-art

Some of the following slides by Steve Seitz


Optical character recognition (OCR)
Technology to convert scanned docs to text
• If you have a scanner, it probably came with OCR software

Digit recognition, AT&T labs License plate readers


[Link] [Link]
Face detection

• Many new digital cameras now detect faces


– Canon, Sony, Fuji, …
Smile detection

Sony Cyber-shot® T70 Digital Still Camera


3D from thousands of images

Building Rome in a Day: Agarwal et al. 2009


Object recognition (in supermarkets)

LaneHawk by EvolutionRobotics
“A smart camera is flush-mounted in the checkout lane, continuously
watching for items. When an item is detected and recognized, the
cashier verifies the quantity of items that were found under the basket,
and continues to close the transaction. The item can remain under the
basket, and with LaneHawk,you are assured to get paid for it… “
Vision-based biometrics

“How the Afghan Girl was Identified by Her Iris Patterns” Read the story
wikipedia
Login without a password…

Face recognition systems now


Fingerprint scanners on
beginning to appear more widely
many new laptops, [Link]
other devices
Object recognition (in mobile phones)

Point & Find, Nokia


Google Goggles
Special effects: shape capture

The Matrix movies, ESC Entertainment, XYZRGB, NRC


Special effects: motion capture

Pirates of the Carribean, Industrial Light and Magic


Sports

Sportvision first down line


Nice explanation on [Link]

[Link]
Smart cars Slide content courtesy of Amnon Shashua

• Mobileye
– Vision systems currently in high-end BMW, GM,
Volvo models
– By 2010: 70% of car manufacturers.
Google cars

[Link]
Interactive Games: Kinect
• Object Recognition:
[Link]
• Mario: [Link]
• 3D: [Link]
• Robot: [Link]
Vision in space

NASA'S Mars Exploration Rover Spirit captured this westward view from atop
a low plateau where Spirit spent the closing months of 2007.

Vision systems (JPL) used for several tasks


• Panorama stitching
• 3D terrain modeling
• Obstacle detection, position tracking
• For more, read “Computer Vision on Mars” by Matthies et al.
Industrial robots

Vision-guided robots position nut runners on wheels


Mobile robots

NASA’s Mars Spirit Rover


[Link] [Link]

Saxena et al. 2008


STAIR at Stanford
Medical imaging

Image guided surgery


3D imaging
Grimson et al., MIT
MRI, CT
Deep learning for visual inference
Human Vision / Human Brain

Geometry
Machine Learning

Computer Vision
Deep Learning
Optics /
Cameras

Robotics
This course
Human Vision / Human Brain

Geometry
Machine Learning

Computer Vision
Deep Learning
Optics /
Cameras

Robotics
Relationship with Other Fields
• Image Processing: Image Image
Relationship with Other Fields
• Computer Vision: Image Knowledge

cat
deer
Relationship with Other Fields
• Computer Graphics: Knowledge Image

Vertices, Locations, Objects,


Shapes, Colors, Material properties,
Lighting settings, Camera settings, etc.
Visual Recognition?
• What does it mean to “see”?
• “What” is “where”, Marr 1982

• Get computers to “see”


Verification

Is this a car?
Classification:
Is there a car in this picture?
Detection:
Where is the car in this picture?
Pose Estimation:
Activity Recognition:

What is he doing? What is he doing?


Object Categorization:

Sky

Person
Tree

Horse
Car

Person
Bicycle
Road
Segmentation

Sky

Tree

Car

Person
Describing Images with Language
Text-to-Image Synthesis: Text2Scene
Object recognition
Is it really so hard?
Find the chair in this image Output of normalized correlation

This is a chair
Object recognition
Is it really so hard?

Find the chair in this image

Pretty much garbage


Simple template matching is not going to make it
This is an image to us
This is an image to a computer
Challenges 1: view point variation

Michelangelo 1475-1564 slide by Fei Fei, Fergus & Torralba


Challenges 2: illumination

slide credit: S. Ullman


Challenges 3: occlusion

Magritte, 1957 slide by Fei Fei, Fergus & Torralba


Challenges 4: scale

slide by Fei Fei, Fergus & Torralba


Challenges 5: deformation

slide by Fei Fei, Fergus & Torralba Xu, Beihong 1943


Challenges 6: background clutter

Klimt, 1913 slide by Fei Fei, Fergus & Torralba


Challenges 7: object intra-class variation

slide by Fei-Fei, Fergus & Torralba


Challenges 8: local ambiguity

slide by Fei-Fei, Fergus & Torralba


Challenge 9 – labels are ambiguous
Challenge 10 – different spatial resolution
Challenge 11 – medical images
History of ideas in recognition
• 1960s – early 1990s: the geometric era
• 1990s: appearance-based models
• Mid-1990s: sliding window approaches
• Late 1990s: local features
• Early 2000s: parts-and-shape models
• Mid-2000s: bags of features
• Present trends: combination of local and global methods, data-
driven methods, context

Svetlana Lazebnik
Recognition as an alignment problem:
Block world

L. G. Roberts, Machine
Perception of Three
Dimensional Solids, Ph.D.
thesis, MIT Department
of Electrical Engineering,
1963.

J. Mundy, Object Recognition in the Geometric Era: a Retrospective, 2006


Recognition by components
Biederman (1987)

Primitives (geons) Objects

[Link]
Svetlana Lazebnik
History of ideas in recognition
• 1960s – early 1990s: the geometric era
• 1990s: appearance-based models
• Mid-1990s: sliding window approaches
• Late 1990s: local features
• Early 2000s: parts-and-shape models
• Mid-2000s: bags of features
• Present trends: combination of local and global methods, data-
driven methods, context

Svetlana Lazebnik
Eigenfaces (Turk & Pentland, 1991)

Svetlana Lazebnik
Color Histograms

Swain and Ballard, Color Indexing, IJCV 1991. Svetlana Lazebnik


History of ideas in recognition
• 1960s – early 1990s: the geometric era
• 1990s: appearance-based models
• Mid-1990s: sliding window approaches
• Late 1990s: local features
• Early 2000s: parts-and-shape models
• Mid-2000s: bags of features
• Present trends: combination of local and global methods, data-
driven methods, context

Svetlana Lazebnik
Sliding window approaches
History of ideas in recognition
• 1960s – early 1990s: the geometric era
• 1990s: appearance-based models
• Mid-1990s: sliding window approaches
• Late 1990s: local features
• Early 2000s: parts-and-shape models
• Mid-2000s: bags of features
• Present trends: combination of local and global methods, data-
driven methods, context

Svetlana Lazebnik
Local features for object instance
recognition

D. Lowe (1999, 2004)


History of ideas in recognition
• 1960s – early 1990s: the geometric era
• 1990s: appearance-based models
• Mid-1990s: sliding window approaches
• Late 1990s: local features
• Early 2000s: parts-and-shape models
• Mid-2000s: bags of features
• Present trends: combination of local and global methods, data-
driven methods, context

Svetlana Lazebnik
Representing people
History of ideas in recognition
• 1960s – early 1990s: the geometric era
• 1990s: appearance-based models
• Mid-1990s: sliding window approaches
• Late 1990s: local features
• Early 2000s: parts-and-shape models
• Mid-2000s: bags of features
• Present trends: combination of local and global methods, data-
driven methods, context

Svetlana Lazebnik
Bag-of-words models
Bag of
Object
‘words’

Svetlana Lazebnik
Deep Learning and Vision
• Deep Learning has been a great disruption into the field of Computer
Vision. Has made a lot of new things work!

• Many deep learning methods being applied to vision these days.

• This is a deep learning course for visual inference. We will briefly review
some pre-deep learning methods, and then mostly deep learning.
Objectives

▪ Understanding foundational concepts for representation learning


using neural networks

▪ Becoming familiar with state-of-the-art models for tasks such as


image classification, object detection, image segmentation, scene
recognition, etc.

▪ Obtain practical experience in the implementation of visual


recognition models using deep learning.
Resources
• Online lectures (Stanford, MIT, NYU, IITM)
• Deep learning book by GoodFellow
• [Link]
• Deep Learning Methods and Applications by Deng & Yu
• [Link]
apers/[Link]
• Papers
Pytorch, Tensorflow, Keras
• Pytorch – zero to GAN
• [Link]

• Keras by deepLizard
• [Link]

• Tensorflow tutorial
• [Link]
Traditional ML vs Deep Learning (Recap)
Image acquisition

z = 𝑓(𝑥, 𝑦)
Electromagnetic Spectrum – characterizing
the energy

Human Luminance Sensitivity Function


[Link]
Beyond visible spectra – satellite images

The Infra-red region is better in characterizing the classes that the visible region
Multi-spectral and hyper-spectral
Spatial resolution
Different rep. of 3D information in images
Video data – appearance + motion (optical
flow)
Optical flow - a quick review
Color coding the optical flow
What are Image Features?
Color histogram?
Image
Maximum color on sub-
0.22
areas of the image?
0.30
0.13
Any statistics on the
0.24
input image?
0.31
0.15
The output of some
0.35
image processing on
0.48
the input image?
Why are they useful?

Image
0.22
0.30
0.13 Machine
0.24
0.31 Learning Predictions
0.15
0.35
Model
0.48

As inputs to a machine learning model


Why are they useful?
0.22
0.30
0.13
0.24
0.31
0.15
0.35
0.48 Distance function
(e.g. Euclidean
0.24
0.34 distance)
0.23
0.27
0.63
0.15
0.25
0.48

To compare images (i.e. retrieve similar images)


Feature detection and description
Visual features
• Color histogram
• Edge & boundary feature
• Shape feature Invariant to transformations
• Texture feature - Scale
- Rotation
• Interests points based feature - Translation
- Affine transformation
• Deep features - Illumination change
- Some more
Image Features: Color
Image Features: Color
Color often not a powerful feature

However, these are all images of people but the


colors in each image are very different.
Texture feature
But textures are not always easy to quantify
Image filtering: Convolution operator

𝑘(𝑥, 𝑦)

𝑔 𝑥, 𝑦 = ෍ ෍ 𝑘 𝑢, 𝑣 𝑓(𝑥 − 𝑢, 𝑦 − 𝑣)
𝑣 𝑢

Image Credit: [Link]


[Link]
Image filtering: e.g. Mean Filter
Image filtering: Convolution operator
Important filter: gaussian filter (gaussian blur)

1/16 1/8 1/16

𝑘(𝑥, 𝑦) = 1/8 1/4 1/8

𝑘(𝑥, 𝑦) 1/16 1/8 1/16


Important filter: Gaussian

• Weight contributions of neighboring pixels by nearness

0.003 0.013 0.022 0.013 0.003


0.013 0.059 0.097 0.059 0.013
0.022 0.097 0.159 0.097 0.022
0.013 0.059 0.097 0.059 0.013
0.003 0.013 0.022 0.013 0.003

5 x 5,  = 1

Slide credit: Christopher Rasmussen


Image filtering: Convolution operator
e.g. gaussian filter (gaussian blur)

Image Credit: [Link]


Practice with linear filters

?
0 0 0
0 1 0
0 0 0

Original

Source: D. Lowe
Practice with linear filters

0 0 0
0 1 0
0 0 0

Original Filtered
(no change)

Source: D. Lowe
Practice with linear filters

?
0 0 0
0 0 1
0 0 0

Original

Source: D. Lowe
Practice with linear filters

0 0 0
0 0 1
0 0 0

Original Shifted left


By 1 pixel

Source: D. Lowe
Practice with linear filters

-
0 0 0 1 1 1
0 2 0
0 0 0
1 1 1
1 1 1
?
(Note that filter sums to 1)
Original

Source: D. Lowe
Practice with linear filters

-
0 0 0 1 1 1
0 2 0 1 1 1
0 0 0 1 1 1

Original Sharpening filter


- Accentuates differences
with local average

Source: D. Lowe
Key properties of linear filters
Linearity:
imfilter(I, f1 + f2) =
imfilter(I,f1) + imfilter(I,f2)

Shift invariance: same behavior regardless of pixel


location
imfilter(I,shift(f)) = shift(imfilter(I,f))

Any linear, shift-invariant operator can be represented


as a convolution

Source: S. Lazebnik
Image filtering: Convolution operator
Important Filter: Sobel operator

1 0 -1

𝑘(𝑥, 𝑦) = 2 0 -2

𝑘(𝑥, 𝑦) 1 0 -1

Image Credit: [Link]


1 0 -1
2 0 -2
1 0 -1

Sobel

Vertical Edge
Slide by James Hays (absolute value)
1 2 1
0 0 0
-1 -2 -1

Sobel

Horizontal Edge
Slide by James Hays (absolute value)
Sobel operators are equivalent to 2D partial
derivatives of the image
• Vertical sobel operator – Partial derivative in X (width)

• Horizontal sobel operator – Partial derivative in Y (height)

• Can compute magnitude and phase at each location

• Useful for detecting edges


Sobel filters are (approximate) partial
derivatives of the image
Let 𝑓(𝑥, 𝑦) be your input image, then the partial derivative is:

𝜕𝑓(𝑥, 𝑦) 𝑓 𝑥 + ℎ, 𝑦 − 𝑓(𝑥, 𝑦)
= lim
𝜕𝑥 ℎ→0 ℎ

𝜕𝑓(𝑥, 𝑦) 𝑓 𝑥 + ℎ, 𝑦 − 𝑓(𝑥 − ℎ, 𝑦)
Also: = lim
𝜕𝑥 ℎ→0 2ℎ
But digital images are not continuous, they
are discrete
Let 𝑓[𝑥, 𝑦] be your input image, then the partial derivative is:

Δ𝑥 𝑓[𝑥, 𝑦] = 𝑓[𝑥 + 1, 𝑦] − 𝑓[𝑥, 𝑦]

Also: Δ𝑥 𝑓[𝑥, 𝑦] = 𝑓[𝑥 + 1, 𝑦] − 𝑓[𝑥 − 1, 𝑦]


But digital images are not continuous, they
are discrete
Let 𝑓[𝑥, 𝑦] be your input image, then the partial derivative is:

Δ𝑥 𝑓[𝑥, 𝑦] = 𝑓[𝑥 + 1, 𝑦] − 𝑓[𝑥, 𝑦] k(x, y) = -1 1

Also: Δ𝑥 𝑓[𝑥, 𝑦] = 𝑓[𝑥 + 1, 𝑦] − 𝑓[𝑥 − 1, 𝑦] k(x, y) = -1 0 1


Sobel Operators Smooth in Y and then
Differentiate in X

1 1 0 -1

k(x, y) = 2 * 1 0 -1 = 2 0 -2

1 1 0 -1

Similarly to differentiate in Y
From Wikipedia
Gabor filters
Image gradient in a nuttshell
Gradient histogram
Image Features: HoG

Paper by Navneet Dalal & Bill Triggs presented at CVPR 2005 for detecting people.
Image Features: HoG
Image Features: HoG
Compute gradients
𝐼𝑥 𝐼𝑦 𝐼𝑥2 + 𝐼𝑦2

*
Image Features: HoG

We will aggregate
gradient magnitude
and directions
in 8x8 pixel regions
Image Features: HoG

Compute a histogram
with 9 bins for angles
from 0 to 180
Image Features: HoG

Normalize histograms
with respect to
histograms of adjacent
neighbors.
Image Features: HoG
Image (or image region)
represented by a vector
containing all the
histograms.

In this case how long is


that vector?
Interest points - motivation
Interest points - motivation
Panorama stitching
Corners
Cornerness
The state of the art interest point detector -
SIFT

Won’t cover in the class – Pl refer to David Lowe paper for details
Visual words
Image Features: Bag of (Visual) Words
Representation

Bag of Features (read more in Gabriela Csurka’s ECCV 2004 paper) slide by Fei-fei Li
Extract SIFT
Feature
Descriptors

Intuition: sample a bunch of pieces (“words”) from various parts of the image. Images of the same object are
more likely to share similar pieces

slide by Fei-fei Li
Extract SIFT
Feature
Descriptors

Create Dictionary
of Distinctive SIFT
Features
slide by Fei-fei Li
Extract SIFT
Feature
Descriptors

Compute
Histograms of
Features
slide by Fei-fei Li
GIST

For more, see [Link]


Idea of low, mid and high level features
• Edge, point, line – low level
• Visual words – mid level
• Object feature – high level (more semantic meanings)

✓ However, we have different set of techniques for extracting such features before deep learning
✓ Those techniques are not optimized
✓ No idea on which feature is good for which data/task

But, this gives the notion of hierarchical features!


In deep learning
Suggested reading
Suggested readings

You might also like