0% found this document useful (0 votes)
20 views13 pages

Overview of Computer Vision Techniques

Computer vision involves mathematical techniques to recover the 3D shape and appearance of objects from images, with significant advancements in the last two decades enabling accurate 3D modeling and object recognition. Despite these advances, achieving human-like image understanding remains a challenge due to the complexities of visual perception. The applications of computer vision span various fields, including medical imaging, self-driving vehicles, and consumer-level photo editing, showcasing its versatility and impact.

Uploaded by

ivorybrett
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views13 pages

Overview of Computer Vision Techniques

Computer vision involves mathematical techniques to recover the 3D shape and appearance of objects from images, with significant advancements in the last two decades enabling accurate 3D modeling and object recognition. Despite these advances, achieving human-like image understanding remains a challenge due to the complexities of visual perception. The applications of computer vision span various fields, including medical imaging, self-driving vehicles, and consumer-level photo editing, showcasing its versatility and impact.

Uploaded by

ivorybrett
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

What is computer vision?

Mathematical techniques for recovering the three-dimensional shape and appearance of objects in
imagery.
The progress in the Last 2 decades
1. We now have reliable techniques for accurately computing a 3D model of an environment from
thousands of partially overlapping photographs.
2. If given a large enough set of views of a particular object or façade(front view), we can create
accurate dense 3D surface models using stereo matching.
3. We can even, with moderate success, discribe most of the people and objects in a photograph.
Despite all of these advances, the dream of having a computer explain an image at the same level of
detail and causality as a two-year old child ,remains challenging.
Why is vision so difficult?
It is because it is an Inverse Problem, in which we seek to recover some unknowns given insufficient
information to fully specify the solution.
In computer vision, we are trying to do the inverse, i.e., to describe the world that we see in one or
more images and to reconstruct its properties, such as shape, illumination, and color distributions.
It is amazing that humans and animals do this so effortlessly.

Modeling the visual world in all of its rich complexity is far more difficult than, modeling the vocal
tract that produces spoken sounds.
The good news is that computer vision is being used today in a wide variety of real-world
applications
Application of computer vision
Optical character recognition (OCR): reading handwritten postal codes on letters and automatic number plate
recognition (ANPR).
Machine inspection: rapid parts inspection for quality assurance using stereo vision with specialized
illumination to measure tolerances on aircraft wings or auto body parts
Retail: object recognition for automated checkout lanes and fully automated stores looking for defects in steel
castings using X-ray vision;
Warehouse logistics: autonomous package delivery and pallet-carrying “drives” (and parts picking by robotic
manipulators.
Medical imaging: registering pre-operative and intra-operative imagery or performing long-term studies of
people’s brain morphology as they age.
Self-driving vehicles: capable of driving point-to-point between cities .
Application
3D model building (photogrammetry): fully automated construction of 3D models from aerial and
drone photographs.
Match move: merging computer-generated imagery (CGI) with live action footage by tracking feature
points in the source video to estimate the 3D camera motion and shape of the environment.
Surveillance: monitoring for intruders, analyzing highway traffic and monitoring pools for drowning
victims.
Fingerprint recognition and biometrics: for automatic access authentication as well as forensic
applications.
David Lowe’s website of industrial vision applications ([Link] [Link])
lists many other interesting industrial applications of computer vision.
Consumer-level applications
Consumer-level applications, such as things we can do with our own personal photographs and video.
These include:
Stitching: turning overlapping photos into a single seamlessly stitched panorama.
Exposure bracketing: merging multiple exposures taken under challenging lighting conditions (strong
sunlight and shadows) into a single perfectly exposed image.
Morphing: turning a picture of one of your friends into another, using a seamless morph transition
3D modeling: converting one or more snapshots into a 3D model of the object or person you are
photographing .
Video match move and stabilization: inserting 2D pictures or 3D models into your videos by
automatically tracking nearby reference points or using motion estimates to remove shake from your
videos
Consumer level applications
Photo-based walkthroughs: navigating a large collection of photographs, such as the interior of
your house, by flying between different photos in 3D.
Face detection: for improved camera focusing as well as more relevant image searching .
Visual authentication: automatically logging family members onto your home computer as they
sit down in front of the webcam
Photometric Image Formation
Photometric image formation is the process of converting light from a scene into a digital
image. It's a fundamental concept in computer vision.
It explains how 3D geometric features in the world are projected into 2D features in an image.
Images are made up of discrete color or intensity values.
These values are related to the lighting in the
environment, surface properties and geometry , camera
optics and sensors properties.
How it works
•Light from a source reflects off a surface.
•Some of the reflected light passes through an image plane I =L*R
•The light reaches a sensor plane through optics where I🡪 Intensity of Light captured by camera
•The sensor captures the light and creates an image L ->Light source Intensity
R🡪 Surface Reflectance
BRDF
To represent the reflectance property of any material , we want to be able to describe its
properties both in terms of incident direction and the reflecting direction.
It results in Bidirectional Reflectance Distribution function.
Direction is described by two angles.
It describes how much of each wavelength arriving at an incident
direction is emitted in a reflection direction

BRDF is 4D function,
its unit is 1/steradian
Properties of BRDF
1. BRDF Non-Negativity🡪 because reflectance can not be negative.
2. Reciprocity(Helmholtz reciprocity)🡪 swapping the incoming and outgoing light direction does not change .
fr(θi , φi , θr, φr; λ). = fr( θr, φr, θi,φi; λ).
1. Energy Conservation🡪 the surface can not reflect more than it receives. The total reflected light is always less than or equal to
the incident light.
2. Directional Dependence🡪 BRDF depends on incoming light direction and out going light direction. Different materials have
different BRDF.
3. Wavelength Dependence🡪 It is different for different wavelength of light, which explains why materials appears in different
colour.
4. For isotropic surface BRDF is 3D function.
fr(θi , θr, |φr − φi |; λ) or fr(ˆvi , ˆvr, ˆn; λ)
Isotropic surfaces are rotationally symmetric.
Anisotropic surface which are brushed or scratched surface where the reflectance depends on the light orientation relative to the
direction of the scratches.
Digital Camera
After starting from one or more light sources, reflecting off one
or more surfaces in the world, and passing through the
camera’s optics (lenses), light finally reaches the imaging
sensor.
Camera model🡪 Image sensing pipeline
Light falling on an imaging sensor is picked up by an active
sensing area, integrated for the duration of the exposure
which is usually expressed as the shutter speed in a fraction of
a second.
It then passed to a set of sense amplifiers.
The two main kinds of sensor used are charge-coupled device
(CCD) and complementary metal oxide on silicon (CMOS).
The main factors affecting the performance of a digital image
sensor are the shutter speed, sampling pitch, fill factor, chip
size, analog gain, sensor noise, and the resolution (and quality)
of the analog-to-digital converter.
Digital imaging sensors:
Digital imaging sensors:
(a) CCDs move photogenerated charge from pixel to pixel and convert it to voltage at the output node; CMOS
imagers convert charge to voltage inside each pixel
Point Operators
The simplest kinds of image processing transforms are point operators.
Each output pixel’s value depends on only the corresponding input pixel value.
Examples of such operators include brightness and contrast adjustments , color correction and
transformations.
Such operations are also known as point processes
Point operators modify the intensity of individual pixels based on a transformation function without
considering neighboring pixels.
Common types include:
1. Gamma Correction - Adjusts brightness using a power-law function.
2. Contrast Stretching - Expands pixel intensity range for better visibility.
3. Thresholding- Converts an image to binary by setting pixels above a threshold to 255 and others
to 0.
4. Log Transform- Enhances dark regions by applying a logarithmic function.
5. Negative Transformation - Inverts pixel values to create a negative image.
Linear filtering
The most commonly used type of neighborhood operator is a linear filter.
A linear filter is a mathematical operation that modifies an image by changing the
signal's frequency spectrum.
Linear filters take a linear mathematical operation that helps in removing noise,
improving or extracting certain features from the images
Key properties of linear filter.
Linearity 🡪 imfilter( I, f1+f2) = imfilter(I, f1) + imfilter ( I , f2)
Shift Invariance: Same behavior regardless of pixel location
imfilter( I, shift(f)) = shift(imfilter(I,f))
Any linear, shift invariant operator can be represented as a convolution

You might also like