0% found this document useful (0 votes)
52 views34 pages

Sequence Learning with RNNs Explained

The document discusses sequence learning problems in deep learning, highlighting the differences between fixed-size inputs in feedforward networks and the variable-size, dependent inputs in sequence models. It introduces Recurrent Neural Networks (RNNs) as a solution to model such tasks, emphasizing parameter sharing and recurrent connections to account for dependencies between inputs. Various applications of RNNs, including sentiment analysis and language translation, are also mentioned, along with challenges like the vanishing and exploding gradient problems.

Uploaded by

roycetheebanedu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views34 pages

Sequence Learning with RNNs Explained

The document discusses sequence learning problems in deep learning, highlighting the differences between fixed-size inputs in feedforward networks and the variable-size, dependent inputs in sequence models. It introduces Recurrent Neural Networks (RNNs) as a solution to model such tasks, emphasizing parameter sharing and recurrent connections to account for dependencies between inputs. Various applications of RNNs, including sentiment analysis and language translation, are also mentioned, along with challenges like the vanishing and exploding gradient problems.

Uploaded by

roycetheebanedu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

EC 9170

Deep Learning for Electrical &


Computer Engineers

Sequence Models

Faculty of Engineering, University of Jaffna


Sequence Learning Problems
• In feedforward and CNNs, the size of the input is always fixed.
• For example, we fed fixed-size (32 × 32) images to convolutional neural
networks for image classification.
• Further, each input to the network is independent of the previous or future
inputs

• For example, the computations, outputs and decisions for two successive
images are completely independent of each other.
• In many applications, the input is not of a fixed size.
• Further successive inputs may not be independent of each other.
Sequence Learning Problems
• For example, consider the task of auto-completion.
• Given the first character ‘d’ you want to predict the
next character ‘e’ and so on

• First, successive inputs are no longer independent


(while predicting ‘e’ , you would want to know what
the previous input was in addition to the current
input)

• Second, the length of the inputs and the number of predictions are not fixed
(for example, “learn”, “deep”, “machine” have different numbers of characters)

• Third, each network (orange-blue-green structure) is performing the same


task (input: character, output: character)
• These are known as sequence learning problems
Sequence Learning Problems
• Consider the task of predicting the part of speech tag
(noun, adverb, adjective verb) of each word in a
sentence.
• Once we see an adjective (social) we are almost sure
that the next word should be a noun (man)
• Thus the current output (noun) depends on the
current input as well as the previous input
• Further the size of the input is not fixed (sentences
could have arbitrary number of words)

• Notice that here, we are interested in producing an output at each time step
• Each network is performing the same task (input: word, output: tag)
• Sometimes, we may not be interested in producing an output at every stage
• Instead, we would look at the full sequence and then produce an output
Sequence Learning Problems

• For example, consider the task of predicting the polarity


of a movie review
• The prediction clearly does not depend only on the last
word but also on some words which appear before
• Here again we could think that the network is
performing the same task at each step (input : word,
output : +/−) but it’s just that we don’t care about
intermediate outputs
Sequence Learning Problems
• Sequences could be composed of anything (not just words)
• For example, a video could be treated as a sequence of images
• We may want to look at the entire sequence and detect the activity being
performed

How do we model such tasks involving sequences?


Main issues using ANN for sequence problem

• Variable size of input/output neuron


• Too much computation (text will be converted to vector
to feed to input neuron)
• No parameter sharing
• Dependencies between inputs
Recurrent Neural Networks

What is the function being executed at each time step?

• Since we want the same function to be executed at each timestep, we should share
the same network (i.e., same parameters at each timestep)
• This parameter sharing also ensures that the network becomes agnostic to the
length (size) of the input.
• Since we are simply going to compute the same function at each time step, the
number of timesteps doesn’t matter
• We just create multiple copies of the network and execute them at each timestep
Recurrent Neural Networks
How do we account for dependence between inputs?

• The function computed at each time step is now


different.

• The network is sensitive to the length of the


sequence
• For example, a sequence of length 10 will
require f1,...,f10, whereas a sequence of length
Is this method okay?
100 will require f1,...,f100. → No, it violates the other two items
on our wish list.
Recurrent Neural Networks
• The solution is to add a recurrent connection in the network.

si is the state of the network at timestep i

• The parameters are W, U, V, c, b, which are shared across timesteps


• The same network (and parameters) can be used to compute y1,y2,...,y10 or
y100
Recurrent Neural Networks

Generic Representation of RNN


Let us revisit the sequence learning problems that we saw earlier

We now have recurrent


connections between time steps
which account for dependence
between inputs.
How does RNN reduce complexity?
• Given function f: h’,y=f(h,x)
h and h’ are vectors with
the same dimension

y1 y2 y3

h0 f h1 f h2 f h3 ……

x1 x2 x3
No matter how long the input/output sequence is, we only need one function f.
If f’s are different, then it becomes a feedforward NN. This may be treated as
another compression from fully connected network.
Different types of RNN

Eg: Sentiment analysis, review


Eg: Music generation, poetry writing

Eg: speech tag Eg: Language translation


Deep RNN ……


z1 z2 z3

g0 f2 g1 f2 g2 f2 g3 ……
g’,z = f2(g,y)

y1 y2 y3

h0 f1 h1 f1 h2 f1 h3 ……

x1 x2 x3 h’,y = f1(h,x)
Bidirectional RNN
x1 x2 x3

g0 f2 g1 f2 g2 f2 g3

z,g = f2(g,x) z1 z2 z3

p=f3(y,z) f3 p1 f3 p2 f3 p3

y1 y2 y3

h0 f1 h1 f1 h2 f1 h3

y,h=f1(x,h) x1 x2 x3
Backpropagation through time
Backpropagation through time
Vanishing vs Exploding Gradient problem
Thank you!

You might also like