0% found this document useful (0 votes)

31 views16 pages

Understanding Computer Vision Basics

Computer Vision is a subfield of AI that enables machines to interpret and understand visual data, drawing parallels to human vision. The field has evolved from early analog systems to complex tasks like object detection, facial recognition, and image segmentation, significantly aided by advancements in deep learning and datasets like ImageNet. Applications of computer vision span various industries, including augmented reality in e-commerce, education, gaming, travel, and healthcare, enhancing user experiences and operational efficiencies.

Uploaded by

jaiwanthish

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views16 pages

Understanding Computer Vision Basics

Uploaded by

jaiwanthish

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

What is computer vision?

Computer Vision is a subfield of Deep Learning and Artificial Intelligence where humans
teach computers to see and interpret the world around them.
While humans and animals naturally solve vision as a problem from a very young age,
helping machines interpret and perceive their surroundings via vision remains a largely
unsolved problem.
Limited perception of the human vision along with the infinitely varying scenery of our
dynamic world is what makes Machine Vision complex at its core.

Human Vision System vs. Computer Vision system

A brief history of computer vision

Like all great things in the world of technology, computer vision started with a cat.
Two Swedish scientists, Hubel and Wiesel, placed a cat in a restricting harness and an
electrode in its visual [Link] scientists showed the cat a series of images through a
projector, hoping its brain cells in the visual cortex would start firing.
With no avail with images, the eureka moment happened when a projector slide was
removed, and a single horizontal line of light appeared on the wall—
Neurons fired, emitting a crackling electrical noise.
The scientists had just realized that the early layers of the visual cortex respond to simple
shapes, like lines and curves, much like those in the early layers of a deep neural network.
They then used an oscilloscope to create these and observe the brain’s reaction.
This experiment marks the beginning of our understanding of the interconnection between
computer vision and the human brain, which will be helpful for our understanding of artificial
neural networks.
However
Before cat brains entered the scene, analog computer vision began as early as the 1950s
at universities pioneering artificial intelligence.

Computer vision vs. human vision

The notion that machine vision must be derived from the animal vision was predominant as
early as 1959—when the neurophysiologists mentioned above tried to understand cat
vision.
Since then, the history of computer vision is dotted with milestones formed by the rapid
development of image capturing and scanning instruments complemented by state-of-the-
art image processing algorithms’ design.
The 1960s saw the emergence of AI as an academic field of study, followed by the
development of the first robust Optical Character Recognition system in 1974.
By the 2000s, the focus of Computer Vision has been shifted to much more complex topics,
including:
• Object identification
• Facial recognition
• Image Segmentation
• Image Classification
And more—
All of them have achieved commendable accuracies over the years.
The year 2010 saw the birth of the ImageNet dataset with millions of labeled images freely
available for research. This led to the formation of the AlexNet architecture two years later—
making it one of the biggest breakthroughs in Computer Vision, cited over 82K times.

Image Processing as a Part of Computer Vision

Digital Image Processing, or Image Processing, in short, is a subset of Computer Vision. It
deals with enhancing and understanding images through various algorithms.
More than just a subset, Image Processing forms the precursor of modern-day computer
vision, overseeing the development of numerous rule-based and optimization-based
algorithms that have led machine vision to what it is today.
Image Processing may be defined as the task of performing a set of operations on an image
based on data collected by algorithms to analyze and manipulate the contents of an image
or the image data.
Now that you know the theory behind computer vision let’s talk about its practical side.

How does computer vision work?

Here’s a simple visual representation that answers this question on the most basic level.

However—
While the three steps outlining the basics of computer vision seem easy, processing and
understanding an image via machine vision are quite difficult. Here’s why—
An image consists of several pixels, with a pixel being the smallest quanta in which the
image can be divided into.
Computers process images in the form of an array of pixels, where each pixel has a set of
values, representing the presence and intensity of the three primary colors: red, green, and
blue.
All pixels come together to form a digital image.
The digital image, thus, becomes a matrix, and Computer Vision becomes a study of
matrices. While the simplest computer vision algorithms use linear algebra to manipulate
these matrices, complex applications involve operations like convolutions with learnable
kernels and downsampling via pooling.
Below is an example of how a computer “sees” a small image.

The values represent the pixel values at the particular coordinates in the image, with 255
representing a complete white point and 0 representing a complete dark point.
For larger images, matrices are much larger.
While it is easy for us to get an idea of the image by looking at it, a peek at the pixel values
shows that the pixel matrix gives us no information on the image!
Therefore, the computer has to perform complex calculations on these matrices and
formulate relationships with neighboring pixel elements just to say that this image
represents a person’s face.
Developing algorithms for recognizing complex patterns in images might make you realize
how complex our brains are to excel at pattern recognition so naturally.
Some operations commonly used in computer vision based on a Deep Learning
perspective include:
1. Convolution: Convolution in computer vision is an operation in which a learnable
kernel is “convolved” with the image. In other words—the kernel is slided across the
image pixel by pixel, and an element-wise multiplication is performed between the
kernel and the image at every pixel group.
2. Pooling: Pooling is an operation used to reduce the dimensions of an image by
performing operations at a pixel level. A pooling kernel slides across the image, and
only one pixel from the corresponding pixel group is selected for further processing,
thus reducing the image size., eg., Max Pooling, Average Pooling.
3. Non-Linear Activations: Non-Linear activations introduce non-linearity to the
neural network, thereby allowing the stacking of multiple convolutions and pooling
blocks to increase model depth.

7 common computer vision tasks

In essence, computer vision tasks are about making computers understand digital images
as well as visual data from the real world. This can involve extracting, processing, and
analyzing information from such inputs to make decisions.
The evolution of machine vision saw the large-scale formalization of difficult problems into
popular solvable problem statements.
Division of topics into well-formed groups with proper nomenclature helped researchers
around the globe identify problems and work on them efficiently.
The most popular computer vision tasks that we regularly find in AI jargon include:
Image classification
Image classification is one of the most studied topics ever since the ImageNet dataset was
released in 2010.
Being the most popular computer vision task taken up by both beginners and experts,
image classification as a problem statement is quite simple.
Given a group of images, the task is to classify them into a set of predefined classes using
solely a set of sample images that have already been classified.
As opposed to complex topics like object detection and image segmentation, which have
to localize (or give positions for) the features they detect, image classification deals with
processing the entire image as a whole and assigning a specific label to it.

Object detection
Object detection, as the name suggests, refers to detection and localization of objects
using bounding boxes.
Object detection looks for class-specific details in an image or a video and identifies them
whenever they appear. These classes can be cars, animals, humans, or anything on which
the detection model has been trained.
Previously methods of object detection used Haar Features, SIFT, and HOG Features to
detect features in an image and classify them based on classical machine learning
approaches.
This process, other than being time-consuming and largely inaccurate, has severe
limitations on the number of objects that can be detected.
As such, Deep Learning models like YOLO, RCNN, SSD that use millions of parameters to
break through these limitations are popularly employed for this task.
Object detection is often accompanied by Object Recognition, also known as Object
Classification.

Image segmentation
Image segmentation is the division of an image into subparts or sub-objects to demonstrate
that the machine can discern an object from the background and/or another object in the
same image.
A “segment” of an image represents a particular class of object that the neural network has
identified in an image, represented by a pixel mask that can be used to extract it.
This popular domain of Computer Vision has been studied widely both with the use of
traditional image processing algorithms like watershed algorithms, clustering-based
segmentation and with the use of popular modern-day deep learning architectures like
PSPNet, FPN, U-Net, SegNet, etc.

Face and person recognition

Facial Recognition is a subpart of object detection where the primary object being detected
is the human face.
While similar to object detection as a task, where features are detected and localized, facial
recognition performs not only detection, but also recognition of the detected face.
Facial recognition systems search for common features and landmarks like eyes, lips, or a
nose, and classify a face using these features and the positioning of these landmarks.
Traditional Image Processing based methods for facial recognition include Haar Cascades,
which is easily accessible via the OpenCV library. Some more robust methods using Deep
Learning based algorithms can be found in papers like FaceNet.

Edge detection
Edge detection is the task of detecting boundaries in objects.
It is algorithmically performed with the help of mathematical methods that help detect sharp
changes or discontinuities in the brightness of the image. Often used as a data pre-
processing step for many tasks, edge detection is primarily done by traditional image
processing-based algorithms like Canny Edge detection and by convolutions with specially
designed edge detection filters.
Furthermore, edges in an image give us paramount information about the image contents,
resulting in all deep learning methods performing edge detection internally for the capture
of global low-level features with the help of learnable kernels.

Image restoration
Image Restoration refers to the restoration or the reconstruction of faded and old image
hard copies that have been captured and stored in an improper manner, leading to loss of
quality of the image.

Typical image restoration processes involve the reduction of additive noise via
mathematical tools, while at times, reconstruction requires major changes, leading to further
analysis and the use of image inpainting.
In Image inpainting, damaged parts of an image are filled with the help of generative models
that make an estimate of what the image is trying to convey. Often the restoration process
is followed by a colorization process that colors the subject of the picture (if black and white)
in the most realistic manner possible.

Feature matching
Features in computer vision are regions of an image that tell us the most about a particular
object in the image.
While edges are strong indicators of object detail and. therefor,e important features, much
more localized and sharp details—like corners, also serve as features. Feature matching
helps us to relate the features of similar region of one image with those of another image.
The applications of feature matching are found in computer vision tasks like object
identification and camera calibration. The task of feature matching is generally performed
in the following order:
1. Detection of features: Detection of regions of interest is generally performed by
Image Processing algorithms like Harris Corner Detection, SIFT, and SURF.
2. Formation of local descriptors: After features are detected, the region
surrounding each keypoint is captured and the local descriptors of these regions of
interest are obtained. A local descriptor is the representation of a point’s local
neighborhood and thus can be helpful for feature matching.
3. Feature matching: The features and their local descriptors are matched in the
corresponding images to complete the feature matching step.

Scene reconstruction
One of the most complex problems of computer vision, scene reconstruction is the digital
3D reconstruction of an object from a photograph.
Most algorithms in scene reconstruction roughly work by forming a point cloud at the
surface of the object and reconstructing a mesh from this point cloud.

Video motion analysis

Video motion analysis is a task in machine vision that refers to the study of moving objects
or animals and the trajectory of their bodies.
Motion analysis as a whole is a combination of many subtasks, particularly object detection,
tracking, segmentation, and pose estimation.
While human motion analysis is used in areas like sports, medicine, intelligent video
analytics, and physical therapy, motion analysis is also used in other areas like
manufacturing and to count and track microorganisms like bacteria and viruses.

Computer vision technology challenges

One of the biggest challenges in machine vision is our lack of understanding of how the
human brain and the human visual system works.
We have an enhanced and complex sense of vision that we can figure out at a very young
age but are unable to explain the process by which we can understand what we see.
Furthermore, day-to-day tasks like walking across the street at the zebra crossing, pointing
at something in the sky, checking out the time on the clock, require us to know enough
about the objects around us to understand our surroundings.
Such aspects are quite different from simple vision, but are largely inseparable from it. The
simulation of human vision via algorithms and mathematical representation thus
requires the identification of an object in an image and an understanding of its presence
and its behaviour.

Computer Vision Application in Augmented reality

Augmented reality (AR) is a method of providing an experience of the natural
surroundings with a computer-generated augmentation appropriate to the surroundings.
With the help of computer vision, AR can be virtually limitless, with augmentations providing
translations of written text and applying filters to objects in the world we see, directly when
we see them.

Computer Vision and Augmented Reality for E-Commerce

Recently, AR has made it possible for retailers to showcase their products in
real-time. IKEA was one of the first to roll out an AR application that enabled buyers
to visualize products within their homes. Today, more and more retailers are
utilizing AI software to elevate the shopping experience and ease purchasing
decisions for their clients.

The same goes for online clothing stores. The technology enables shoppers to
virtually try on clothes and find their perfect fit. Nowadays, the number of online stores
unveiling their fitting rooms by an app is growing. For now, computer vision-based
augmented reality has been proven efficient in providing a better customer experience,
improving brand perception and boosting sales.

AI Augmented Reality For Education

When it comes to education, AR and VR are total game-changers. They have
the potential to improve the learning process and better motivate and engage students.
More importantly, the technologies have proven to be successful in real-time training.
When theory fails to improve the recall, virtual reality AI comes to the rescue. Thanks
to this, students have a range of VR tutorials and modeling sessions where they can
obtain hands-on experience and polish up their techniques.
Machine Learning-Based VR for Gaming
Virtual reality is evolving at breakneck speed. And some of the latest VR
breakthroughs haven’t been possible without machine learning and computer vision.

According to Grand View Research, the global virtual reality in gaming market size is
expected to reach USD 45.09 billion by 2025. Lately, virtual reality powered by
computer vision has given a brand new twist to the video game industry. There’s a
number of benefits VR has to offer for the gaming business:

• significant increase in sales

• enhanced user experience
• improved player retention

AI-Powered Augmented Reality for Travel and Tourism

The technology has been actively utilized by travel agencies and hotels to better
the overall brand reputation and boost revenue. Within the hospitality and tourism
industry, augmented reality fueled by computer vision acts as a powerful tool to bring
more interaction into hotels and resorts and convince travelers into impulse booking.

On top of that, some travel agencies develop AR apps that offer breathtaking
immersive tours. The ultimate goal of those apps is to take the potential traveler on an
interactive tour somewhere sunny and give them the most of the information about the
destination. For sophisticated travelers, there’s an opportunity to get a unique view of
a sight or a resort from the drone. Thanks to AR drone image processing, tourism
agencies are now reaping the benefits of drone technology and promoting tourist
destinations.

VR and AR for Healthcare

These days, AR is making a significant contribution to the healthcare industry.

The innovation empowers healthcare professionals to provide better diagnosis and
make surgery safer. Using AI coupled with computer vision and AR, surgeons can now
place surgical incisions more precisely and prevent tissue damage.

Moreover, computer-vision based virtual reality is the next big thing for mental
health and psychotherapy. It’s utilized to treat patients with post-traumatic stress
disorders (PTSD), depression, anxiety, and other mental-related issues.

Understanding Computer Vision in AI
No ratings yet
Understanding Computer Vision in AI
6 pages
Computer Vision: Transforming Machine Perception
No ratings yet
Computer Vision: Transforming Machine Perception
6 pages
Introduction to Computer Vision Basics
No ratings yet
Introduction to Computer Vision Basics
21 pages
Overview of Computer Vision Techniques
No ratings yet
Overview of Computer Vision Techniques
15 pages
Computer Vision & Image Processing Overview
No ratings yet
Computer Vision & Image Processing Overview
43 pages
Computer Vision and Image Recognition Overview
No ratings yet
Computer Vision and Image Recognition Overview
11 pages
Introduction to Computer Vision AI
No ratings yet
Introduction to Computer Vision AI
13 pages
Computer Vision Seminar Report
No ratings yet
Computer Vision Seminar Report
17 pages
Computer Vision: Concepts and Evolution
No ratings yet
Computer Vision: Concepts and Evolution
4 pages
Computer Vision and AI Integration
No ratings yet
Computer Vision and AI Integration
8 pages
Understanding Computer Vision Basics
No ratings yet
Understanding Computer Vision Basics
15 pages
Understanding Computer Vision Basics
No ratings yet
Understanding Computer Vision Basics
9 pages
Introduction to Computer Vision
No ratings yet
Introduction to Computer Vision
9 pages
Knowledge Distillation in Computer Vision
No ratings yet
Knowledge Distillation in Computer Vision
38 pages
Understanding Computer Vision and Image Processing
No ratings yet
Understanding Computer Vision and Image Processing
64 pages
Computer Vision in E-commerce Applications
No ratings yet
Computer Vision in E-commerce Applications
27 pages
Understanding Computer Vision Basics
No ratings yet
Understanding Computer Vision Basics
12 pages
Overview of Computer Vision Applications
No ratings yet
Overview of Computer Vision Applications
7 pages
Introduction to Computer Vision in AI
100% (1)
Introduction to Computer Vision in AI
25 pages
Applications of Computer Vision Explained
No ratings yet
Applications of Computer Vision Explained
23 pages
Computer Vision: History and Future Insights
No ratings yet
Computer Vision: History and Future Insights
5 pages
Understanding Computer Vision Basics
No ratings yet
Understanding Computer Vision Basics
18 pages
Lec 2
No ratings yet
Lec 2
67 pages
Computer Vision and Image Processing Basics
No ratings yet
Computer Vision and Image Processing Basics
57 pages
Overview of Computer Vision Techniques
No ratings yet
Overview of Computer Vision Techniques
3 pages
Understanding Computer Vision Basics
No ratings yet
Understanding Computer Vision Basics
7 pages
Foundations of Computer Vision BCS613B
No ratings yet
Foundations of Computer Vision BCS613B
26 pages
Overview of Computer Vision Concepts
No ratings yet
Overview of Computer Vision Concepts
11 pages
Understanding Computer Vision Basics
No ratings yet
Understanding Computer Vision Basics
5 pages
Overview of Computer Vision Advances
No ratings yet
Overview of Computer Vision Advances
4 pages
Introduction to Computer Vision Basics
No ratings yet
Introduction to Computer Vision Basics
18 pages
Advances in Computer Vision Research
No ratings yet
Advances in Computer Vision Research
430 pages
Class 10 Computer Vision Overview
88% (16)
Class 10 Computer Vision Overview
7 pages
Grade 10 Notes - Computer Vision-1
No ratings yet
Grade 10 Notes - Computer Vision-1
8 pages
Introduction to Computer Vision Concepts
No ratings yet
Introduction to Computer Vision Concepts
9 pages
Understanding Computer Vision Basics
No ratings yet
Understanding Computer Vision Basics
9 pages
Lec 1
No ratings yet
Lec 1
65 pages
Understanding Computer Vision Basics
No ratings yet
Understanding Computer Vision Basics
8 pages
Automated Vision System Using Image Processing
No ratings yet
Automated Vision System Using Image Processing
66 pages
Computer Vision Notes for Class 10
No ratings yet
Computer Vision Notes for Class 10
2 pages
Computer Vision in AI Applications
No ratings yet
Computer Vision in AI Applications
3 pages
Lecture 01 Introduction To Computer Vision PDF
No ratings yet
Lecture 01 Introduction To Computer Vision PDF
118 pages
CNN Concepts for Image Classification Review
No ratings yet
CNN Concepts for Image Classification Review
16 pages
Overview of Computer Vision Concepts
No ratings yet
Overview of Computer Vision Concepts
8 pages
Transferable Visual Models in AI
No ratings yet
Transferable Visual Models in AI
3 pages
Understanding Computer Vision Basics
No ratings yet
Understanding Computer Vision Basics
7 pages
Computer Vision 202526
No ratings yet
Computer Vision 202526
34 pages
Making Machines See Notes
No ratings yet
Making Machines See Notes
6 pages
Computer Vision Algorithms Survey
No ratings yet
Computer Vision Algorithms Survey
12 pages
Introduction to Computer Vision Basics
No ratings yet
Introduction to Computer Vision Basics
8 pages
Understanding Computer Vision Basics
No ratings yet
Understanding Computer Vision Basics
49 pages
Understanding Computer Vision Basics
No ratings yet
Understanding Computer Vision Basics
5 pages
COMPUTER VISION
No ratings yet
COMPUTER VISION
41 pages
Understanding Computer Vision Basics
No ratings yet
Understanding Computer Vision Basics
7 pages
Overview of Computer Vision Applications
No ratings yet
Overview of Computer Vision Applications
31 pages
Principles of Computer Vision and Imaging
No ratings yet
Principles of Computer Vision and Imaging
56 pages
Computer Vision Lecture Notes 2021
No ratings yet
Computer Vision Lecture Notes 2021
144 pages
Computer Vision Techniques Overview
No ratings yet
Computer Vision Techniques Overview
29 pages
Arbitration Institutions in India
100% (1)
Arbitration Institutions in India
26 pages
Improvised Fuzzy C-Means for Big Data Clustering
No ratings yet
Improvised Fuzzy C-Means for Big Data Clustering
8 pages
EMT Installation and Bending Guide
50% (2)
EMT Installation and Bending Guide
4 pages
Magnetic Resonance Imaging (Mri) Safety Information: Lntegra - 311 Enterprise Drive. Plainsboro. NJ 08536 609-275
No ratings yet
Magnetic Resonance Imaging (Mri) Safety Information: Lntegra - 311 Enterprise Drive. Plainsboro. NJ 08536 609-275
3 pages
Outcomes 3. Outcomes - Intermediate - Vocabulary - Builder 2nd
No ratings yet
Outcomes 3. Outcomes - Intermediate - Vocabulary - Builder 2nd
17 pages
ModelSim Verilog Simulation Guide
No ratings yet
ModelSim Verilog Simulation Guide
22 pages
SB23-EHS1 Control Block Overview
No ratings yet
SB23-EHS1 Control Block Overview
32 pages
Petersfield Striking Incident Report
No ratings yet
Petersfield Striking Incident Report
22 pages
Automatic Braking System Project Overview
No ratings yet
Automatic Braking System Project Overview
12 pages
IBM 3931 Mainframe Capacity Chart
No ratings yet
IBM 3931 Mainframe Capacity Chart
78 pages
Viral Instagram Reels Strategy Guide
No ratings yet
Viral Instagram Reels Strategy Guide
3 pages
AmiBroker Heatmap V1.0 Guide
100% (1)
AmiBroker Heatmap V1.0 Guide
16 pages
Inverter Waveform Analysis Techniques
No ratings yet
Inverter Waveform Analysis Techniques
16 pages
Enterprise Sonic Distribution by Dell Technologies Technical Overview
No ratings yet
Enterprise Sonic Distribution by Dell Technologies Technical Overview
31 pages
Macroeconomics Canadian 1st Edition Hubbard Solutions Manual 1
100% (89)
Macroeconomics Canadian 1st Edition Hubbard Solutions Manual 1
36 pages
Contextualization in Teaching Strategies
No ratings yet
Contextualization in Teaching Strategies
24 pages
Vascular Tissue and Periderm Functions
No ratings yet
Vascular Tissue and Periderm Functions
54 pages
Banker's Algorithm Simulation Tool
No ratings yet
Banker's Algorithm Simulation Tool
8 pages
LCD Interface with AT89C2051 Microcontroller
No ratings yet
LCD Interface with AT89C2051 Microcontroller
5 pages
English Week 2024 Task List Overview
No ratings yet
English Week 2024 Task List Overview
3 pages
Tilling: The Soil Preparation Process
No ratings yet
Tilling: The Soil Preparation Process
34 pages
Social Media's Role in Terrorist Recruitment
No ratings yet
Social Media's Role in Terrorist Recruitment
10 pages
Normal Stress in Fixed Rod Segments
No ratings yet
Normal Stress in Fixed Rod Segments
3 pages
Semi-Expendable Property Registry
100% (1)
Semi-Expendable Property Registry
2 pages
Intravenous Proton Pump Inhibitors: RUG Eview
No ratings yet
Intravenous Proton Pump Inhibitors: RUG Eview
13 pages
CompTIA Network+
No ratings yet
CompTIA Network+
1 page
Panasonic Sa-Pt75ph
No ratings yet
Panasonic Sa-Pt75ph
132 pages
Honda Accord Hybrid IMA Training Module
100% (61)
Honda Accord Hybrid IMA Training Module
4 pages
Bandhakavi Ravi Teja: Businessanalyst
No ratings yet
Bandhakavi Ravi Teja: Businessanalyst
3 pages
Diagnostic Atlas of Veterinary Ophthalmology 2nd Ed. (Elsevier - 2006)
No ratings yet
Diagnostic Atlas of Veterinary Ophthalmology 2nd Ed. (Elsevier - 2006)
220 pages

Understanding Computer Vision Basics

Uploaded by

Understanding Computer Vision Basics

Uploaded by

What is computer vision?

Human Vision System vs. Computer Vision system

A brief history of computer vision

Computer vision vs. human vision

Image Processing as a Part of Computer Vision

How does computer vision work?

7 common computer vision tasks

Face and person recognition

Video motion analysis

Computer vision technology challenges

Computer Vision Application in Augmented reality

Computer Vision and Augmented Reality for E-Commerce

AI Augmented Reality For Education

• significant increase in sales

AI-Powered Augmented Reality for Travel and Tourism

VR and AR for Healthcare

These days, AR is making a significant contribution to the healthcare industry.

You might also like