EC 9170
Deep Learning for Electrical &
Computer Engineers
Sequence Models
Faculty of Engineering, University of Jaffna
Sequence Learning Problems
• In feedforward and CNNs, the size of the input is always fixed.
• For example, we fed fixed-size (32 × 32) images to convolutional neural
networks for image classification.
• Further, each input to the network is independent of the previous or future
inputs
• For example, the computations, outputs and decisions for two successive
images are completely independent of each other.
• In many applications, the input is not of a fixed size.
• Further successive inputs may not be independent of each other.
Sequence Learning Problems
• For example, consider the task of auto-completion.
• Given the first character ‘d’ you want to predict the
next character ‘e’ and so on
• First, successive inputs are no longer independent
(while predicting ‘e’ , you would want to know what
the previous input was in addition to the current
input)
• Second, the length of the inputs and the number of predictions are not fixed
(for example, “learn”, “deep”, “machine” have different numbers of characters)
• Third, each network (orange-blue-green structure) is performing the same
task (input: character, output: character)
• These are known as sequence learning problems
Sequence Learning Problems
• Consider the task of predicting the part of speech tag
(noun, adverb, adjective verb) of each word in a
sentence.
• Once we see an adjective (social) we are almost sure
that the next word should be a noun (man)
• Thus the current output (noun) depends on the
current input as well as the previous input
• Further the size of the input is not fixed (sentences
could have arbitrary number of words)
• Notice that here, we are interested in producing an output at each time step
• Each network is performing the same task (input: word, output: tag)
• Sometimes, we may not be interested in producing an output at every stage
• Instead, we would look at the full sequence and then produce an output
Sequence Learning Problems
• For example, consider the task of predicting the polarity
of a movie review
• The prediction clearly does not depend only on the last
word but also on some words which appear before
• Here again we could think that the network is
performing the same task at each step (input : word,
output : +/−) but it’s just that we don’t care about
intermediate outputs
Sequence Learning Problems
• Sequences could be composed of anything (not just words)
• For example, a video could be treated as a sequence of images
• We may want to look at the entire sequence and detect the activity being
performed
How do we model such tasks involving sequences?
Main issues using ANN for sequence problem
• Variable size of input/output neuron
• Too much computation (text will be converted to vector
to feed to input neuron)
• No parameter sharing
• Dependencies between inputs
Recurrent Neural Networks
What is the function being executed at each time step?
• Since we want the same function to be executed at each timestep, we should share
the same network (i.e., same parameters at each timestep)
• This parameter sharing also ensures that the network becomes agnostic to the
length (size) of the input.
• Since we are simply going to compute the same function at each time step, the
number of timesteps doesn’t matter
• We just create multiple copies of the network and execute them at each timestep
Recurrent Neural Networks
How do we account for dependence between inputs?
• The function computed at each time step is now
different.
• The network is sensitive to the length of the
sequence
• For example, a sequence of length 10 will
require f1,...,f10, whereas a sequence of length
Is this method okay?
100 will require f1,...,f100. → No, it violates the other two items
on our wish list.
Recurrent Neural Networks
• The solution is to add a recurrent connection in the network.
si is the state of the network at timestep i
• The parameters are W, U, V, c, b, which are shared across timesteps
• The same network (and parameters) can be used to compute y1,y2,...,y10 or
y100
Recurrent Neural Networks
Generic Representation of RNN
Let us revisit the sequence learning problems that we saw earlier
We now have recurrent
connections between time steps
which account for dependence
between inputs.
How does RNN reduce complexity?
• Given function f: h’,y=f(h,x)
h and h’ are vectors with
the same dimension
y1 y2 y3
h0 f h1 f h2 f h3 ……
x1 x2 x3
No matter how long the input/output sequence is, we only need one function f.
If f’s are different, then it becomes a feedforward NN. This may be treated as
another compression from fully connected network.
Different types of RNN
Eg: Sentiment analysis, review
Eg: Music generation, poetry writing
Eg: speech tag Eg: Language translation
Deep RNN ……
…
z1 z2 z3
g0 f2 g1 f2 g2 f2 g3 ……
g’,z = f2(g,y)
y1 y2 y3
h0 f1 h1 f1 h2 f1 h3 ……
x1 x2 x3 h’,y = f1(h,x)
Bidirectional RNN
x1 x2 x3
g0 f2 g1 f2 g2 f2 g3
z,g = f2(g,x) z1 z2 z3
p=f3(y,z) f3 p1 f3 p2 f3 p3
y1 y2 y3
h0 f1 h1 f1 h2 f1 h3
y,h=f1(x,h) x1 x2 x3
Backpropagation through time
Backpropagation through time
Vanishing vs Exploding Gradient problem
Thank you!