0% found this document useful (0 votes)

46 views165 pages

Deep Learning in Computer Vision

Uploaded by

Patel Het Manojkumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views165 pages

Deep Learning in Computer Vision

Uploaded by

Patel Het Manojkumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Computer vision, Feature

extraction and Deep Learning

by
Ankit Jha

Resources courtesy: Google, Medium Blogs, Courses from IIT Bombay, MIT, etc.
Computer Vision
Make computers understand images and
video.

What kind of scene?

Where are the cars?

How far is the

building?

…
Vision is really hard

• Vision is an amazing feat of natural

intelligence
– Visual cortex occupies about 50% of Macaque brain
– More human brain devoted to vision than anything else

Is that a
queen or a
bishop?
Why computer vision matters

Safety Health Security

Comfort Fun Access

Ridiculously brief history of computer vision
• 1966: Minsky assigns computer vision
as an undergrad summer project
• 1960’s: interpretation of synthetic
worlds
Guzman ‘68
• 1970’s: some progress on interpreting
selected images
• 1980’s: ANNs come and go; shift toward
geometry and increased mathematical
rigor
• 1990’s: face recognition; statistical Ohta Kanade ‘78
analysis in vogue
• 2000’s: broader recognition; large
annotated datasets available; video
processing starts

Turk and Pentland ‘91

How vision is used now
• Examples of state-of-the-art

Some of the following slides by Steve Seitz

Optical character recognition (OCR)
Technology to convert scanned docs to text
• If you have a scanner, it probably came with OCR software

Digit recognition, AT&T labs License plate readers

[Link] [Link]
Face detection

• Many new digital cameras now detect faces

– Canon, Sony, Fuji, …
Smile detection

Sony Cyber-shot® T70 Digital Still Camera

3D from thousands of images

Building Rome in a Day: Agarwal et al. 2009

Object recognition (in supermarkets)

LaneHawk by EvolutionRobotics
“A smart camera is flush-mounted in the checkout lane, continuously
watching for items. When an item is detected and recognized, the
cashier verifies the quantity of items that were found under the basket,
and continues to close the transaction. The item can remain under the
basket, and with LaneHawk,you are assured to get paid for it… “
Vision-based biometrics

“How the Afghan Girl was Identified by Her Iris Patterns” Read the story
wikipedia
Login without a password…

Face recognition systems now

Fingerprint scanners on
beginning to appear more widely
many new laptops, [Link]
other devices
Object recognition (in mobile phones)

Point & Find, Nokia

Google Goggles
Special effects: shape capture

The Matrix movies, ESC Entertainment, XYZRGB, NRC

Special effects: motion capture

Pirates of the Carribean, Industrial Light and Magic

Sports

Sportvision first down line

Nice explanation on [Link]

[Link]
Smart cars Slide content courtesy of Amnon Shashua

• Mobileye
– Vision systems currently in high-end BMW, GM,
Volvo models
– By 2010: 70% of car manufacturers.
Google cars

[Link]
Interactive Games: Kinect
• Object Recognition:
[Link]
• Mario: [Link]
• 3D: [Link]
• Robot: [Link]
Vision in space

NASA'S Mars Exploration Rover Spirit captured this westward view from atop
a low plateau where Spirit spent the closing months of 2007.

Vision systems (JPL) used for several tasks

• Panorama stitching
• 3D terrain modeling
• Obstacle detection, position tracking
• For more, read “Computer Vision on Mars” by Matthies et al.
Industrial robots

Vision-guided robots position nut runners on wheels

Mobile robots

NASA’s Mars Spirit Rover

[Link] [Link]

Saxena et al. 2008

STAIR at Stanford
Medical imaging

Image guided surgery

3D imaging
Grimson et al., MIT
MRI, CT
Deep learning for visual inference
Human Vision / Human Brain

Geometry
Machine Learning

Computer Vision
Deep Learning
Optics /
Cameras

Robotics
This course
Human Vision / Human Brain

Geometry
Machine Learning

Computer Vision
Deep Learning
Optics /
Cameras

Robotics
Relationship with Other Fields
• Image Processing: Image Image
Relationship with Other Fields
• Computer Vision: Image Knowledge

cat
deer
Relationship with Other Fields
• Computer Graphics: Knowledge Image

Vertices, Locations, Objects,

Shapes, Colors, Material properties,
Lighting settings, Camera settings, etc.
Visual Recognition?
• What does it mean to “see”?
• “What” is “where”, Marr 1982

• Get computers to “see”

Verification

Is this a car?
Classification:
Is there a car in this picture?
Detection:
Where is the car in this picture?
Pose Estimation:
Activity Recognition:

What is he doing? What is he doing?

Object Categorization:

Sky

Person
Tree

Horse
Car

Person
Bicycle
Road
Segmentation

Sky

Tree

Car

Person
Describing Images with Language
Text-to-Image Synthesis: Text2Scene
Object recognition
Is it really so hard?
Find the chair in this image Output of normalized correlation

This is a chair
Object recognition
Is it really so hard?

Find the chair in this image

Pretty much garbage

Simple template matching is not going to make it
This is an image to us
This is an image to a computer
Challenges 1: view point variation

Michelangelo 1475-1564 slide by Fei Fei, Fergus & Torralba

Challenges 2: illumination

slide credit: S. Ullman

Challenges 3: occlusion

Magritte, 1957 slide by Fei Fei, Fergus & Torralba

Challenges 4: scale

slide by Fei Fei, Fergus & Torralba

Challenges 5: deformation

slide by Fei Fei, Fergus & Torralba Xu, Beihong 1943

Challenges 6: background clutter

Klimt, 1913 slide by Fei Fei, Fergus & Torralba

Challenges 7: object intra-class variation

slide by Fei-Fei, Fergus & Torralba

Challenges 8: local ambiguity

slide by Fei-Fei, Fergus & Torralba

Challenge 9 – labels are ambiguous
Challenge 10 – different spatial resolution
Challenge 11 – medical images
History of ideas in recognition
• 1960s – early 1990s: the geometric era
• 1990s: appearance-based models
• Mid-1990s: sliding window approaches
• Late 1990s: local features
• Early 2000s: parts-and-shape models
• Mid-2000s: bags of features
• Present trends: combination of local and global methods, data-
driven methods, context

Svetlana Lazebnik
Recognition as an alignment problem:
Block world

L. G. Roberts, Machine
Perception of Three
Dimensional Solids, Ph.D.
thesis, MIT Department
of Electrical Engineering,
1963.

J. Mundy, Object Recognition in the Geometric Era: a Retrospective, 2006

Recognition by components
Biederman (1987)

Primitives (geons) Objects

[Link]
Svetlana Lazebnik
History of ideas in recognition
• 1960s – early 1990s: the geometric era
• 1990s: appearance-based models
• Mid-1990s: sliding window approaches
• Late 1990s: local features
• Early 2000s: parts-and-shape models
• Mid-2000s: bags of features
• Present trends: combination of local and global methods, data-
driven methods, context

Svetlana Lazebnik
Eigenfaces (Turk & Pentland, 1991)

Svetlana Lazebnik
Color Histograms

Swain and Ballard, Color Indexing, IJCV 1991. Svetlana Lazebnik

History of ideas in recognition
• 1960s – early 1990s: the geometric era
• 1990s: appearance-based models
• Mid-1990s: sliding window approaches
• Late 1990s: local features
• Early 2000s: parts-and-shape models
• Mid-2000s: bags of features
• Present trends: combination of local and global methods, data-
driven methods, context

Svetlana Lazebnik
Sliding window approaches
History of ideas in recognition
• 1960s – early 1990s: the geometric era
• 1990s: appearance-based models
• Mid-1990s: sliding window approaches
• Late 1990s: local features
• Early 2000s: parts-and-shape models
• Mid-2000s: bags of features
• Present trends: combination of local and global methods, data-
driven methods, context

Svetlana Lazebnik
Local features for object instance
recognition

D. Lowe (1999, 2004)

Svetlana Lazebnik
Representing people
History of ideas in recognition
• 1960s – early 1990s: the geometric era
• 1990s: appearance-based models
• Mid-1990s: sliding window approaches
• Late 1990s: local features
• Early 2000s: parts-and-shape models
• Mid-2000s: bags of features
• Present trends: combination of local and global methods, data-
driven methods, context

Svetlana Lazebnik
Bag-of-words models
Bag of
Object
‘words’

Svetlana Lazebnik
Deep Learning and Vision
• Deep Learning has been a great disruption into the field of Computer
Vision. Has made a lot of new things work!

• Many deep learning methods being applied to vision these days.

• This is a deep learning course for visual inference. We will briefly review
some pre-deep learning methods, and then mostly deep learning.
Objectives

▪ Understanding foundational concepts for representation learning

using neural networks

▪ Becoming familiar with state-of-the-art models for tasks such as

image classification, object detection, image segmentation, scene
recognition, etc.

▪ Obtain practical experience in the implementation of visual

recognition models using deep learning.
Resources
• Online lectures (Stanford, MIT, NYU, IITM)
• Deep learning book by GoodFellow
• [Link]
• Deep Learning Methods and Applications by Deng & Yu
• [Link]
apers/[Link]
• Papers
Pytorch, Tensorflow, Keras
• Pytorch – zero to GAN
• [Link]

• Keras by deepLizard
• [Link]

• Tensorflow tutorial
• [Link]
Traditional ML vs Deep Learning (Recap)
Image acquisition

z = 𝑓(𝑥, 𝑦)
Electromagnetic Spectrum – characterizing
the energy

Human Luminance Sensitivity Function

[Link]
Beyond visible spectra – satellite images

The Infra-red region is better in characterizing the classes that the visible region
Multi-spectral and hyper-spectral
Spatial resolution
Different rep. of 3D information in images
Video data – appearance + motion (optical
flow)
Optical flow - a quick review
Color coding the optical flow
What are Image Features?
Color histogram?
Image
Maximum color on sub-
0.22
areas of the image?
0.30
0.13
Any statistics on the
0.24
input image?
0.31
0.15
The output of some
0.35
image processing on
0.48
the input image?
Why are they useful?

Image
0.22
0.30
0.13 Machine
0.24
0.31 Learning Predictions
0.15
0.35
Model
0.48

As inputs to a machine learning model

Why are they useful?
0.22
0.30
0.13
0.24
0.31
0.15
0.35
0.48 Distance function
(e.g. Euclidean
0.24
0.34 distance)
0.23
0.27
0.63
0.15
0.25
0.48

To compare images (i.e. retrieve similar images)

Feature detection and description
Visual features
• Color histogram
• Edge & boundary feature
• Shape feature Invariant to transformations
• Texture feature - Scale
- Rotation
• Interests points based feature - Translation
- Affine transformation
• Deep features - Illumination change
- Some more
Image Features: Color
Image Features: Color
Color often not a powerful feature

However, these are all images of people but the

colors in each image are very different.
Texture feature
But textures are not always easy to quantify
Image filtering: Convolution operator

𝑘(𝑥, 𝑦)

𝑔 𝑥, 𝑦 = ෍ ෍ 𝑘 𝑢, 𝑣 𝑓(𝑥 − 𝑢, 𝑦 − 𝑣)
𝑣 𝑢

Image Credit: [Link]

[Link]
Image filtering: e.g. Mean Filter
Image filtering: Convolution operator
Important filter: gaussian filter (gaussian blur)

1/16 1/8 1/16

𝑘(𝑥, 𝑦) = 1/8 1/4 1/8

𝑘(𝑥, 𝑦) 1/16 1/8 1/16

Important filter: Gaussian

• Weight contributions of neighboring pixels by nearness

0.003 0.013 0.022 0.013 0.003

0.013 0.059 0.097 0.059 0.013
0.022 0.097 0.159 0.097 0.022
0.013 0.059 0.097 0.059 0.013
0.003 0.013 0.022 0.013 0.003

5 x 5,  = 1

Slide credit: Christopher Rasmussen

Image filtering: Convolution operator
e.g. gaussian filter (gaussian blur)