Unit-I
Introduction to Deep Learning, Deep Learning Vs Machine Learning, Neural Networks, Data
representations for neural networks, Scalars (0D tensors), Vectors (1D tensors), Matrices (2D
tensors), 3D tensors and higher-dimensional tensors, Key attributes, Manipulating tensors in
Numpy, The notion of data batches, Real-world examples of data tensors, Vector data, Time
series data or sequence data, Image data, Video data
Introduction to Deep Learning
Deep Learning is transforming the way machines understand, learn and interact with complex
data. Deep learning mimics neural networks of the human brain, it enables computers to
autonomously uncover patterns and make informed decisions from vast amounts of
unstructured data.
How Deep Learning Works?
Neural network consists of layers of interconnected nodes or neurons that collaborate to
process input data. In a fully connected deep neural network data flows through multiple
layers where each neuron performs nonlinear transformations, allowing the model to learn
intricate representations of the data.
In a deep neural network the input layer receives data which passes through hidden
layers that transform the data using nonlinear functions. The final output layer generates the
model’s prediction.
Difference between Machine Learning and Deep Learning
Machine learning and Deep Learning both are subsets of artificial intelligence but there are
many similarities and differences between them.
Machine Learning Deep Learning
Apply statistical algorithms to learn the Uses artificial neural network architecture
hidden patterns and relationships in the to learn the hidden patterns and
dataset. relationships in the dataset.
Requires the larger volume of dataset
Can work on the smaller amount of dataset
compared to machine learning
Better for complex task like image
Better for the low-label task. processing, natural language processing,
etc.
Machine Learning Deep Learning
Takes less time to train the model. Takes more time to train the model.
A model is created by relevant features Relevant features are automatically
which are manually extracted from images to extracted from images. It is an end-to-end
detect an object in the image. learning process.
More complex, it works like the black
Less complex and easy to interpret the result. box interpretations of the result are not
easy.
It can work on the CPU or requires less
It requires a high-performance computer
computing power as compared to deep
with GPU.
learning.
Evolution of Neural Architectures
The journey of deep learning began with the perceptron, a single-layer neural network
introduced in the 1950s. While innovative, perceptrons could only solve linearly separable
problems hence failing at more complex tasks like the XOR problem.
This limitation led to the development of Multi-Layer Perceptrons (MLPs). It introduced
hidden layers and non-linear activation functions. MLPs trained
using backpropagation could model complex, non-linear relationships marking a significant
leap in neural network capabilities. This evolution from perceptrons to MLPs laid the
groundwork for advanced architectures like CNNs and RNNs, showcasing the power of
layered structures in solving real-world problems.
Data representations for neural networks
In neural networks, data representations refer to how data is formatted, structured, and fed
into the model for training and inference. Choosing the right representation is crucial, as
neural networks operate on numerical data (usually tensors)
1. Scalars (0D Tensors)
Description: A single numerical value.
Example: 5, 0.75
Use Case: Loss value, learning rate, or a constant input.
2. Vectors (1D Tensors)
Description: An ordered list of numbers.
Shape: (n,)
Example: [3.2, 1.5, 0.9]
Use Case: Feature vector for one data point.
3. Matrices (2D Tensors)
Description: 2D array of numbers (rows × columns).
Shape: (m, n)
Example: A batch of vectors.
Use Case:
o Input features for a batch of samples
o Weights in a fully connected (dense) layer
4. 3D Tensors
Description: 3-dimensional arrays.
Shape: (batch_size, height, width) or (sequence_length, features)
Example:
o Time series data: (batch, time_steps, features)
o Image: (channels, height, width) or (height, width, channels)
Use Case:
o Sequences (like text or time series)
o Colored images (RGB)
5. 4D and Higher-Dimensional Tensors
4D Example: (batch_size, channels, height, width)
5D Example: (batch_size, channels, depth, height, width) — for 3D images like CT
scans.
Use Case:
o CNNs for image processing
o Video (as sequences of frames)
Introduction to Tensor with Tensorflow
Tensor is a multi-dimensional array used to store data in machine learning and deep learning
frameworks, such as TensorFlow. Tensors are the fundamental data structure in TensorFlow,
and they represent the flow of data through a computation graph. Tensors generalize scalars,
vectors, and matrices to higher dimensions.
Types of Tensors
Tensors in TensorFlow can take various forms depending on their number of dimensions.
Scalar (0D tensor): A single number, such as 5 or -3.14.
Vector (1D tensor): A one-dimensional array, such as [1, 2, 3, 4].
Matrix (2D tensor): A two-dimensional array, like a table with rows and columns: [[1,
2], [3, 4]].
3D Tensor: A three-dimensional array, like a stack of matrices: [[[1, 2], [3, 4]], [[5, 6],
[7, 8]]].
Higher-dimensional tensors: Tensors with more than three dimensions are often used to
represent more complex data, such as color images (which might be represented as a 4D
tensor with shape [batch_size, height, width, channels]).
How to represent Tensors in TensorFlow?
TensorFlow framework is designed for high-performance numerical computation, operates
primarily using tensors. When you use TensorFlow, you define your model, train it, and
perform operations using tensors.
A tensor in TensorFlow is represented as an object that has:
Shape: The dimensions of the tensor (e.g., [2, 3] for a matrix with 2 rows and 3
columns).
Rank: The number of dimensions of the tensor (e.g., a scalar has rank 0, a vector has
rank 1, a matrix has rank 2, etc.).
Data type: The type of the elements in the tensor, such as float32, int32, or string.
Device: The device on which the tensor resides (e.g., CPU, GPU).
TensorFlow provides a variety of operations that can be applied to tensors, including
mathematical operations, transformations, and reshaping.
Basic Tensor Operations in TensorFlow
TensorFlow provides a large set of tensor operations, allowing for efficient manipulation of
data. Below are some of the most commonly used tensor operations in TensorFlow:
1. Creating Tensors
You can create tensors using TensorFlow’s [Link]() or its various helper functions, such
as [Link](), [Link](), or [Link]():
import tensorflow as tf
# Scalar (0D tensor)
scalar_tensor = [Link](5)
# Vector (1D tensor)
vector_tensor = [Link]([1, 2, 3, 4])
# Matrix (2D tensor)
matrix_tensor = [Link]([[1, 2], [3, 4]])
# 3D Tensor
tensor_3d = [Link]([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
# Tensor of zeros (2D tensor)
zeros_tensor = [Link]([3, 3])
# Tensor of ones (2D tensor)
ones_tensor = [Link]([2, 2])
2. Tensor Operations
TensorFlow supports various operations that can be performed on tensors, such as element-
wise operations, matrix multiplication, reshaping, and more.
import tensorflow as tf
# Define a matrix tensor
matrix_tensor = [Link]([[1, 2], [3, 4]])
# Define a ones tensor
ones_tensor = tf.ones_like(matrix_tensor)
# Element-wise addition
tensor_add = [Link](matrix_tensor, ones_tensor)
# Matrix multiplication (dot product)
matrix_mult = [Link](matrix_tensor, matrix_tensor)
# Reshape a tensor
#changing shape of matrix_tensor to [4, 1]
reshaped_tensor = [Link](matrix_tensor, [4, 1])
# Transpose a tensor
# flip rows and columns of matrix_tensor
transpose_tensor = [Link](matrix_tensor)
3. Accessing Elements in a Tensor
You can access specific elements within a tensor using indices. Similar to how you access
elements in Python lists or NumPy arrays, TensorFlow provides slicing and indexing
operations.
import tensorflow as tf
# Define a vector tensor
vector_tensor = [Link]([1, 2, 3, 4])
# Accessing the first element of a vector
first_element = vector_tensor[0]
# Define a matrix tensor
matrix_tensor = [Link]([[1, 2], [3, 4]])
# Slicing a tensor (first two rows of the matrix)
matrix_slice = matrix_tensor[:2, :]
Higher-Dimensional Tensors: Key Attributes in Neural Networks
Higher-dimensional tensors (4D and above) are widely used in advanced deep learning
applications, especially in computer vision, video processing, and spatiotemporal modeling.
Here's a breakdown of their key attributes:
Key Attributes of Higher-Dimensional Tensors
Attribute Description
Rank The number of dimensions or axes in the tensor. E.g., a 4D tensor has rank 4.
Attribute Description
A tuple that defines the size along each dimension. For example: (batch_size,
Shape
channels, height, width) for a 4D image tensor.
Total number of elements in the tensor, computed as the product of all shape
Size
dimensions.
Example: 4D Tensor in Convolutional Neural Network (CNN)
# Shape: (batch_size, channels, height, width)
image_tensor = [Link](32, 3, 64, 64)
Dimension Meaning
32 Batch size
3 Channels (RGB)
64 Height of the image
64 Width of the image
Example: 5D Tensor for Video Input
# Shape: (batch_size, frames, channels, height, width)
video_tensor = [Link](8, 10, 3, 64, 64)
Dimension Meaning
8 Batch size
10 Temporal dimension (frames)
3 Channels (e.g., RGB)
64 Height
64 Width
Common Tensor Manipulation Operations in NumPy
1. Creating Tensors
import numpy as np
# 1D Tensor (Vector)
a = [Link]([1, 2, 3])
# 2D Tensor (Matrix)
b = [Link]([[1, 2], [3, 4]])
# 3D Tensor
c = [Link](2, 3, 4) # shape: (2, 3, 4)
2. Reshaping Tensors
a = [Link](12) # shape: (12,)
b = [Link](3, 4) # shape: (3, 4)
3. Transposing Tensors
a = [Link]([[1, 2], [3, 4]])
b = a.T # shape: (2, 2), transpose of matrix
# For higher dimensions
c = [Link](2, 3, 4)
d = [Link](c, (1, 0, 2)) # swaps axes
4. Expanding/Reducing Dimensions
a = [Link]([1, 2, 3])
# Expand dimensions
a_exp = np.expand_dims(a, axis=0) # shape: (1, 3)
a_exp2 = np.expand_dims(a, axis=1) # shape: (3, 1)
# Remove singleton dimensions
b = [Link]([Link]([[[5]]])) # shape: (), scalar
5. Splitting Tensors
a = [Link](12).reshape(3, 4)
# Split into 2 parts along axis 1
split = [Link](a, 2, axis=1)
Operation Function Example
Reshape reshape() [Link](2, 3)
Transpose transpose() or .T a.T
Expand Dimensions np.expand_dims() np.expand_dims(a, axis=0)
Squeeze [Link]() [Link](a)
Concatenate [Link]() [Link]((a,b), 0)
Stack [Link]() [Link]((a,b), axis=0)
Split [Link]() [Link](a, 2, axis=1)
Sum, Mean, etc. [Link](), [Link]() [Link](a, axis=1)
The Notion of Data Batches in Neural Networks
Data batching is a core concept in training neural networks. It refers to dividing the entire
dataset into smaller groups (batches) instead of feeding the whole dataset or one sample at a
time to the model.
Training
Description Pros Cons
Type
Entire dataset at once (rarely
Batch Accurate gradient High memory usage
used)
Training
Description Pros Cons
Type
Small subset (e.g., 32, 64, 128 Efficient & stable
Mini-batch Common in practice
samples) training
Noisy gradient,
Stochastic One sample at a time Frequent updates
unstable
Tensor Shape in Batches
Data Type Tensor Shape (Batch) Example
Tabular (batch_size, num_features) (64, 10)
Grayscale Image (batch_size, height, width, 1) (64, 28, 28, 1)
Color Image (RGB) (batch_size, height, width, 3) (64, 32, 32, 3)
Text (sequences) (batch_size, sequence_length) (64, 100)
Time Series Data
Time series data consists of sequential observations recorded over time at regular
intervals. It captures temporal patterns and dependencies — making it fundamentally
ordered and time-dependent.
Key Characteristics of Time Series Data
Feature Description
Temporal Order The order of data points is crucial (unlike tabular data).
Current values often depend on past values (e.g., temperature
Time Dependency
trends).
Typically collected at consistent intervals (e.g., daily, hourly, per
Fixed Intervals
second).
Univariate /
Single vs multiple variables recorded over time.
Multivariate
Examples of Time Series Data
Domain Example Variables Tensor Shape (Example)
Weather Temperature, humidity, wind speed (128, 30, 3) — 128 samples, 30 days, 3
Domain Example Variables Tensor Shape (Example)
features
(64, 50, 4) — 64 sequences of 50 days, 4
Finance Stock price, volume, volatility
features
Heart rate, blood pressure (ECG, ICU (256, 100, 2) — 256 patients, 100 time
Healthcare
data) points
Sensor readings: pressure, flow, (32, 60, 5) — 32 devices, 60 steps, 5
IoT Devices
temperature sensors
E-
Daily sales per product (1000, 365) — 1000 products, 1 year
commerce
Audio Raw waveform (speech or music) (batch_size, time_steps) — audio frames
Motion Data Accelerometer, gyroscope readings (64, 200, 6) — motion sensors over time
Image Data
Image data is one of the most commonly used types in deep learning, especially for tasks
like classification, object detection, and segmentation. Images are represented as multi-
dimensional tensors, depending on color channels and dataset structure
An image is a grid of pixels, each having:
A position (row, column)
A value (intensity or RGB values)
Image Data Tensor Representations
Image Type Tensor Shape (Single Image) Description
Grayscale Image (height, width) or (height, width, 1) One intensity value per pixel
RGB Image (height, width, 3) 3 channels: Red, Green, Blue
RGBA Image (height, width, 4) RGB + Alpha (transparency)
Real-World Image Data Examples
Dataset Description Shape Example (batch)
MNIST 28×28 grayscale digits (128, 28, 28, 1)
CIFAR-10 32×32 RGB images in 10 classes (64, 32, 32, 3)
Dataset Description Shape Example (batch)
ImageNet Large dataset of 224×224 RGB images (16, 224, 224, 3)
Medical Images CT, X-ray (grayscale or volumetric) (8, 256, 256, 1) or (8, 64, 256, 256)
Video Data
Video data is a sequence of images (frames) over time. It’s inherently spatiotemporal,
meaning it has both spatial (height, width) and temporal (time/frame) dimensions. Videos
are represented as high-dimensional tensors and are used in advanced tasks like activity
recognition, video classification, and object tracking.
Tensor Representation of Video Data
Tensor Component Meaning
batch_size Number of videos in a batch
frames Number of frames per video
height Height of each frame
width Width of each frame
channels 1 (grayscale) or 3 (RGB)
Common Tensor Shape
Framework Tensor Shape
TensorFlow / Keras (batch_size, frames, height, width, channels)
PyTorch (batch_size, channels, frames, height, width)
Real-World Applications of Video Data
Application Task Model Type
3D CNN, RNN,
Video Classification Predict action or scene type
Transformers
Recognize human motion (e.g., running,
Action Recognition 3D CNN, LSTM
jumping)
YOLOv5 +
Object Tracking Follow objects across video frames
SORT/DeepSORT
Surveillance Detect anomalies, motion, or events CNN + RNNs or
Application Task Model Type
Transformers
Decode spoken words from lip
Lip Reading CNN + CTC or attention
movements
Video Captioning Describe what’s happening in a video CNN + LSTM/Transformer
Reinforcement
Environment state via video feed CNN as visual encoder
Learning