0% found this document useful (0 votes)

94 views25 pages

Deep Learning with RNNs for Time-Series

The document provides an introduction to Recurrent Neural Networks (RNNs) for sequential data, detailing their architecture, training methods, and applications. It discusses various models for time-series data, including auto-regressive models, state-space models, and long-memory processes. Key challenges of RNNs, such as difficulty in remembering distant past information and slow processing, are highlighted along with potential solutions like LSTM and GRU variants.

Uploaded by

5hvbgtjv2z

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

94 views25 pages

Deep Learning with RNNs for Time-Series

Uploaded by

5hvbgtjv2z

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Introduction to Deep Learning

Recurrent Neural Networks for Sequential Data (Time Series)

MATH 370: Machine Learning

Tanujit Chakraborty
@ Sorbonne
ctanujit@[Link]
Learning from Time-Series Data
▪ The input is a sequence of (non-i.i.d.) examples 𝑦1 , 𝑦2 , … , 𝑦𝑡 .
▪ The problem may be supervised or unsupervised, e.g.,
▪ Forecasting: Predict 𝑦𝑡+1 using 𝑦1 , 𝑦2 , … , 𝑦𝑡
▪ Cluster the examples or perform dimensionality reduction / Anomaly detection
▪ Evolution of time-series data can be attributed to several factors

▪ Teasing apart these factors of variation is also an important problem.

Auto-regressive Models
▪ Auto-regressive (AR): Regress each example on 𝑝 previous lagged values - AR(𝑝) model

▪ Moving Average (MA): Regress each example on 𝑞 previous stochastic errors - MA(𝑞) model

▪ Auto-regressive Integrated Moving Average (ARMA): Regress each example of 𝑝 previous lagged values
and 𝑞 previous stochastic errors

where 𝑦𝑡 ′ is the differenced series (if the data is nonstationary and differencing is applied). We call this
an ARIMA 𝑝, 𝑑, 𝑞 model.
State-Space Models
▪ Assume that each observation 𝑦𝑡 in the time-series is generated by a low-dimensional latent factor 𝑥𝑡
(one-hot or continuous)

▪ Basically, a generative latent factor model: 𝑦𝑡 = 𝑔(𝑥𝑡 ) and 𝑥𝑡 = 𝑓(𝑥𝑡−1 ), where 𝑔 and 𝑓 are
probability distributions.

▪ Some popular SSMs: Hidden Markov Models (one-hot latent factor 𝑥𝑡 ), Kalman Filters (real-valued latent
factor 𝑥𝑡 )

▪ Note: Models like RNN/LSTM are also similar, except that these are not generative (but can be made
generative)
Long Memory and ARFIMA Process
▪ Definition (Long Memory). Let 𝑋𝑡 , 𝑡 ∈ ℤ be a weakly stationary univariate process with auto-covariance
function 𝛾𝑋 (𝑘) and spectral density function
𝑓𝑋 (𝜆) = (2𝜋)−1 σ∞
𝑘=−∞ 𝛾𝑋 (𝑘)exp(−𝑖𝑘𝜆) for 𝑘 ∈ ℤ and 𝜆 ∈ [−𝜋, 𝜋].
Then, 𝑋𝑡 has long memory if σ∞
𝑘=−∞ 𝛾𝑋 (𝑘) = ∞, and short memory otherwise.
Equivalently, as |𝜆| → 0, 𝑓𝑋 (𝜆) → ∞ for long memory,

• The most popular long-memory model for level data 𝑦𝑡 is the ARFIMA(p, d, q) model introduced by
Granger and Joyeux (1980) and Hosking (1981). Specifically, an ARFIMA(p, d, q) process 𝑦𝑡 is defined by
(1 − 𝐵)𝑑 𝑦𝑡 = 𝑥𝑡

• B is the one-dimensional backshift operator, 𝑥𝑡 is an ARMA(p, q) process that captures short-range

dependence, and d is a fractional differencing parameter.

• Typically, d is chosen such that −1/2 < 𝑑 < 1/2 to ensure that 𝑦𝑡 is stationary and invertible.
Fractional Differencing

Figure: Fractional Differencing applied to DAX index

[Link]
Recurrent Connections in Deep Neural Networks
▪ Feedforward nets such as MLP assume independent observations
𝒚𝑛 depends
𝒚1 𝒚2 𝒚𝑛 only on 𝒉𝑛 𝒚𝑁
𝒗 𝒗 𝒗 𝒉𝑛 depends 𝒗
only on 𝒙𝑛 Feedforward neural networks are not ideal
𝒉1 𝒉2 𝒉𝑛 𝒉𝑁 when inputs [𝒙1 , 𝒙2 , … , 𝒙𝑁 ] and/or outputs
[𝒚1, 𝒚2 , … , 𝒚𝑁 ] represent sequential data
𝑾 𝑾 𝑾 𝑾
(e.g., sequence of words, video (sequence of
𝒙1 𝒙2 𝒙𝑛 𝒙𝑁 frames), etc.

▪ A recurrent structure can be helpful if each input and/or output is a sequence

Corresponding output
𝒚1 𝒚2 𝒚3 𝒚𝑛 𝒚𝑁 (assuming same length as
𝒚𝑛
the input)
Each step of the
input is given in𝒗 𝒗 𝒗 𝒗 𝒗 Compactly
𝒗
form of an 𝑼 𝑼 𝑼
𝑼 𝑼 𝑼
embedding (e.g.,
word2vec if input
𝒉1 𝒉2 𝒉3 𝒉𝑛 𝒉𝑁 𝒉𝑛 𝑼

is a sequence of
words) 𝑾 𝑾 𝑾 𝑾 𝑾 𝑾
A single input
𝒙1 𝒙2 𝒙3 𝒙𝑛 𝒙𝑁 of length 𝑁 𝒙𝑛
RNNs
▪ RNNs are used when each input or output or both are sequences of tokens

𝒚1 𝒚2 𝒚3 𝒚𝑡 Decoder part 𝒚𝑇 𝒚𝑡
𝒗 𝒗 𝒗 𝒗 𝒗 Compactly 𝒗
𝑼 𝑼 𝑼 𝑼 𝑼 𝑼
𝒉1 𝒉2 𝒉3 𝒉𝑡 𝒉𝑇 𝒉𝑡 𝑼

𝑾 𝑾 𝑾 𝑾 𝑾 𝑾

𝒙1 𝒙2 𝒙3 𝒙𝑡 Encoder part 𝒙𝑇 𝒙𝑡
If the input is a word sequence, then each 𝒙𝑛 represent the
corresponding word’s embedding (either a pre-computed
word embedding like word2vec or a learned word embedding)

▪ Hidden state 𝒉𝑡 is supposed to remember everything up to time 𝑡 − 1. However, in practice, RNNs have
difficulties remembering the distant past
▪ Variants such as LSTM, GRU, etc mitigate this issue to some extent

▪ Slow processing is another major issue (e.g., can’t compute 𝒉𝑡 before computing 𝒉𝑡−1 )
Recurrent Neural Networks
▪ A basic RNN’s architecture (assuming input and output sequence have same lengths)

𝒚1 𝒚2 𝒚3 𝒚𝑡 𝒚𝑇 𝒚𝑡
𝒗 𝒗 𝒗 𝒗 𝒗 Compactly 𝒗
𝑼 𝑼 𝑼 𝑼 𝑼 𝑼
𝒉1 𝒉2 𝒉3 𝒉𝑡 𝒉𝑇 𝒉𝑡 𝑼

𝑾 𝑾 𝑾 𝑾 𝑾 𝑾

𝒙1 𝒙2 𝒙3 𝒙𝑡 𝒙𝑇 𝒙𝑡
Given in form of an embedding
(e.g., word embedding if 𝑥1 is a word) 𝑔 is some activation
▪ RNN has three sets of weights 𝑾, 𝑼, 𝒗 function like ReLU

▪ 𝑾 and 𝑼 model how ℎ𝑡 at step t is computed: 𝒉𝑡 = 𝑔(𝑾𝒙𝑡 + 𝑼𝒉𝑡−1 ) 𝑜 depends on the nature
▪ 𝒗 models the hidden layer to output mapping, e.g., 𝒚𝑡 = 𝑜(𝒗𝒉𝑡 ) of 𝑦𝑡 . If it is categorical
then 𝑜 can be softmax
▪ Important: Same 𝑾, 𝑼, 𝒗 are used at all steps of the sequence (weight sharing)
Recurrent Neural Nets (RNN)
▪ A more “micro” view of RNN (the transition matrix U connects the hidden states across
observations, propagating information along the sequence)

Pic source: [Link]

RNN in Action
Workflow of RNN:
• The gif above reflects the magic of recurrent
networks.

• It depicts 4 timesteps. The first is exclusively

influenced by the input data.

• The second one is a mixture of the first and

second inputs. This continues on.

• You should recognize that, in some way,

network 4 is "full".

• Presumably, timestep 5 would have to choose

which memories to keep and which ones to
overwrite.

• This is very real. It's the notion of memory

"capacity".

• As you might expect, bigger layers can hold

more memories for a longer period of time.
Pic source: [Link]
Training RNN
▪ Trained using Backpropagation Through Time (forward propagate from step 1 to end, and then backward
propagate from end to step)
▪ Think of the time-dimension as another hidden layer and then it is just like standard backpropagation for
feedforward neural nets

• They learn by fully propagating

forward from 1 to 4 (through an
entire sequence of arbitrary length),
and then backpropagating all the
derivatives from 4 back to 1.

• You can also pretend that it's just a

funny shaped normal neural network,
except that we're re-using the same
weights (synapses 0,1,and h) in their
respective places.

• Other than that, it's normal

backpropagation.

Black: Prediction, Yellow: Error, Orange: Gradients Pic source: [Link]

RNN Applications
▪ In many problems, each input, each output, or both may be in form of sequences

These green cells are some

feature representation
(e.g., hidden layer of a deep
neural network) of the inputs

▪ Different inputs or outputs need not have the same length

▪ Some examples of prediction tasks in such problems
▪ Image captioning: Input is image (not a sequence), output is the caption (word sequence)
▪ Document classification: Input is a word sequence, output is a categorical label
▪ Machine translation: Input is a word sequence, output is a word sequence (in different language)
▪ Stock price prediction: Input is a sequence of stock prices, output is its predicted price tomorrow
▪ No input – just output (e.g., generation of random but plausible-looking text)
Recurrent Neural Networks: Some Examples
▪ Consider generating a sequence 𝑦1 , 𝑦2 , … , 𝑦𝑇 given an input 𝑥
Words in the generated
caption Predicted 𝑦𝑡−1 also At test time, we can only
fed into ℎ𝑡 feed the predicted 𝑦𝑡−1
Hidden states at each step
of the sequence
During training, if the true 𝑦𝑡−1 is fed,
An image we call it “teacher forcing”
This final hidden state is Isn’t this too much to
supposed to contain the expect?? ☺
▪ Predicting the sentiment of a movie review information about the
entire review
Indeed; this can be an
Hidden states issue with RNNs

Each node Predicted

denotes an sentiment
Words in
embedding of
the review
the word
Recurrent Neural Networks: Some Examples
▪ Parts of speech tagging (or “aligned” translation; input and output have same length)

Parts of speech
tag for each word
Each node
denotes an Words in a
embedding of
the word sentence
▪ “Unaligned” translation (input and output can have different lengths)
Such problems usually require a sequence
encoder- sequence decoder architecture

Encode the input sequence (embeddings of tokens)

into a single embedding vector 𝑐 and then decode
this embedding one output token at a time

▪ In the unaligned case, generation stops when an “end” token (e.g., <END>) is generated
on the output side
Recurrent Neural Networks: Some Examples
▪ Unconditional generation (no input, only an output sequence is generated given a
RNN that was trained using some training data containing several sequences)

𝒚1 𝒚2 𝒚3

𝒉1 𝒉2 𝒉3

“Seed” token, e.g, <START> 𝑠0

▪ Each generate word/token is fed to the next step’s hidden state

▪ Generation stops when an “end” token (e.g., <END>) is generated

For RNNs, Long Distant Past is Hard to Remember
▪ The hidden layer nodes ℎ𝑡 are supposed to summarize the past up to time 𝑡 − 1

𝒚1 𝒚2 𝒚3 𝒚𝑡 𝒚𝑇 𝒚𝑡
𝒗 𝒗 𝒗 𝒗 𝒗 Compactly 𝒗
𝑼 𝑼 𝑼 𝑼 𝑼 𝑼
𝒉1 𝒉2 𝒉3 𝒉𝑡 𝒉𝑇 𝒉𝑡 𝑼

𝑾 𝑾 𝑾 𝑾 𝑾 𝑾

𝒙1 𝒙2 𝒙3 𝒙𝑡 𝒙𝑇 𝒙𝑡

▪ In theory, they should. In practice, they can’t. Some reasons

▪ Vanishing gradients along the sequence too (due to repeated multiplications)
– past knowledge gets “diluted”
▪ Hidden nodes also have limited capacity because of their finite dimensionality

▪ Various extensions of RNNs have been proposed to address forgetting

▪ Gated Recurrent Units (GRU), Long Short Term Memory (LSTM) (Hochreiter and Schmidhuber, mid-90s)
GRU and LSTM
▪ Essentially an RNN, except that the hidden states are computed differently
▪ Recall that RNN computes the hidden states as
𝒉𝑡 = 𝑡𝑎𝑛ℎ(𝑾𝒙𝑡 + 𝑼𝒉𝑡−1 )
▪ For RNN: State update is multiplicative (weak memory and gradient issues)
▪ GRU and LSTM contain specialized units and “memory” which modulate what/how much information
from the past to retain/forget

Pic source: [Link]

Capturing Long-Range Dependencies
▪ Idea: Augment the hidden states with gates (with parameters to be learned)
▪ These gates can help us remember and forget information “selectively”

Pic source: [Link]

• The hidden states have 3 type of gates: Input (bottom), Forget (left), Output (top)
• Open gate denoted by 'o' closed gate denoted by '-'
LSTM
▪ In contrast, LSTM maintains a “context” 𝑪𝑡 and computes hidden states as

• Note: ⨀ represents elementwise vector product. Also, state updates now additive, not
multiplicative. Training using backpropagation through time.

• Many variants of LSTM exists, e.g., using 𝑪𝑡 in local computations, Gated Recurrent Units (GRU),
etc. Mostly minor variations of basic LSTM above.
Do LSTM really have long memory? (ICML’2020)

Figure: Autocorrelation plot of traffic and DJI datasets

(To visualize the long memory in the dataset)

Ref: [Link]
Memory RNN and Bidirectional RNN
▪ RNNs and GRU and LSTM only remember the information from the previous tokens
▪ Memory RNN and Bidirectional RNN can remember information from the past and
future tokens

Embeddings that take information

from both directions (depend on
forward and reverse direction
embeddings)
Reverse direction
embeddings of input tokens

Forward direction
embeddings of input tokens

Ref: [Link]
Ref: [Link]
Exercise: RNN

Pic source: [Link]

Exercise: LSTM Initialize: Randomly set the previous hidden state h0 to [1, 1] and memory cells C0 to [0.3, -0.5]

Pic source: [Link]

Any question?

Readings for you:

▪ Deep Learning book
▪ Forecasting (FPP) Book using Python
▪ AI by Hand by Tom Yeh
▪ Special thanks to Piyush Rai and Jay Alammar
– I adopted some of their slides available online.

Scheduled Sampling in RNNs Explained
No ratings yet
Scheduled Sampling in RNNs Explained
37 pages
RNNs for Time Series Prediction in Finance
100% (1)
RNNs for Time Series Prediction in Finance
35 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
83 pages
RNN3: Advanced Recurrent Neural Networks
No ratings yet
RNN3: Advanced Recurrent Neural Networks
16 pages
RNNs for Time Series Forecasting Guide
No ratings yet
RNNs for Time Series Forecasting Guide
15 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
105 pages
DL For Sequencial Data
No ratings yet
DL For Sequencial Data
36 pages
Time Series Modelling with Neural Networks
No ratings yet
Time Series Modelling with Neural Networks
24 pages
RNN, LSTM, and GRU Overview
No ratings yet
RNN, LSTM, and GRU Overview
14 pages
Introduction to Recurrent Neural Networks
No ratings yet
Introduction to Recurrent Neural Networks
18 pages
Introduction to Recurrent Neural Networks
No ratings yet
Introduction to Recurrent Neural Networks
9 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
44 pages
Understanding Recurrent Neural Networks
100% (1)
Understanding Recurrent Neural Networks
78 pages
Sequence-to-Sequence Learning Overview
No ratings yet
Sequence-to-Sequence Learning Overview
46 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
36 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
20 pages
GRU vs LSTM: Pros and Cons in NLP
No ratings yet
GRU vs LSTM: Pros and Cons in NLP
59 pages
6b. Recurrent Neural Networks
No ratings yet
6b. Recurrent Neural Networks
38 pages
Understanding RNN Architecture
No ratings yet
Understanding RNN Architecture
8 pages
Understanding RNNs and Their Applications
No ratings yet
Understanding RNNs and Their Applications
10 pages
6S191 MIT DeepLearning L2
No ratings yet
6S191 MIT DeepLearning L2
85 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
36 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
32 pages
Understanding RNNs and LSTMs
No ratings yet
Understanding RNNs and LSTMs
38 pages
Understanding RNNs for Sequence Modeling
No ratings yet
Understanding RNNs for Sequence Modeling
72 pages
CNN, RNN, LSTM, and Attention Overview
No ratings yet
CNN, RNN, LSTM, and Attention Overview
86 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
42 pages
Recurrent Neural Networks for Sequential Data
No ratings yet
Recurrent Neural Networks for Sequential Data
15 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
99 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
17 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
68 pages
RNN Design Patterns in Deep Learning
No ratings yet
RNN Design Patterns in Deep Learning
29 pages
RNNs: Unidirectional vs Bidirectional
No ratings yet
RNNs: Unidirectional vs Bidirectional
30 pages
RNN Overview and Applications in Deep Learning
No ratings yet
RNN Overview and Applications in Deep Learning
31 pages
RNNs for Long Sequence Data Processing
100% (1)
RNNs for Long Sequence Data Processing
131 pages
Understanding Recurrent Neural Networks
100% (1)
Understanding Recurrent Neural Networks
34 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
115 pages
Overview of Recurrent Neural Networks
No ratings yet
Overview of Recurrent Neural Networks
32 pages
RNN Overview and Applications in Deep Learning
No ratings yet
RNN Overview and Applications in Deep Learning
31 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
22 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
76 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
23 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
28 pages
Unfolding RNN Computational Graphs
No ratings yet
Unfolding RNN Computational Graphs
44 pages
RNN vs LLM: Key Differences Explained
No ratings yet
RNN vs LLM: Key Differences Explained
18 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
20 pages
RNNs: Understanding Sequential Data
No ratings yet
RNNs: Understanding Sequential Data
30 pages
Unreasonable Effectiveness of RNNs
No ratings yet
Unreasonable Effectiveness of RNNs
1 page
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
22 pages
RNNs and Deep Learning in Life Sciences
No ratings yet
RNNs and Deep Learning in Life Sciences
97 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
190 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
120 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
79 pages
Understanding RNNs: Types & Applications
No ratings yet
Understanding RNNs: Types & Applications
6 pages
Introduction to Sequence Models
No ratings yet
Introduction to Sequence Models
73 pages
Overview of Recurrent Neural Networks
No ratings yet
Overview of Recurrent Neural Networks
45 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
54 pages
Non-stationary Transformers for Time Series
No ratings yet
Non-stationary Transformers for Time Series
21 pages
MLE Estimation in SAS PROC NLP
No ratings yet
MLE Estimation in SAS PROC NLP
6 pages
Uganda S6 Mock Exam Applied Mathematics
No ratings yet
Uganda S6 Mock Exam Applied Mathematics
4 pages
Optimal Decision Rules for Non-Equiprobable Systems
No ratings yet
Optimal Decision Rules for Non-Equiprobable Systems
6 pages
Secure CSPDB Login in Cloud Computing
No ratings yet
Secure CSPDB Login in Cloud Computing
6 pages
Motai Data Variant Kernel Analysis
No ratings yet
Motai Data Variant Kernel Analysis
4 pages
PDF 3
No ratings yet
PDF 3
11 pages
Applications of Laplace Transform in Engineering
No ratings yet
Applications of Laplace Transform in Engineering
8 pages
Sensitivity Analysis in Linear Programming
No ratings yet
Sensitivity Analysis in Linear Programming
19 pages
Place Value Chart Exercises
No ratings yet
Place Value Chart Exercises
3 pages
Harmony Search for Sudoku Solutions
No ratings yet
Harmony Search for Sudoku Solutions
8 pages
LZW Compression and Decompression Guide
No ratings yet
LZW Compression and Decompression Guide
29 pages
ML Tools: Titanic Dataset Exploration
No ratings yet
ML Tools: Titanic Dataset Exploration
4 pages
Sentiment Classification with ML & DCNN
No ratings yet
Sentiment Classification with ML & DCNN
16 pages
First Year Mathematics & Science Booklist
No ratings yet
First Year Mathematics & Science Booklist
3 pages
Assignment Problem Analysis
No ratings yet
Assignment Problem Analysis
22 pages
Accenture Technical MCQ Exam Guide
100% (1)
Accenture Technical MCQ Exam Guide
43 pages
Engineering Analysis Methods Exam 2023
No ratings yet
Engineering Analysis Methods Exam 2023
12 pages
Machine Learning Exam Questions 2022
No ratings yet
Machine Learning Exam Questions 2022
4 pages
Cophenetic Correlation
No ratings yet
Cophenetic Correlation
2 pages
AI-Driven Train Traffic Optimization
No ratings yet
AI-Driven Train Traffic Optimization
6 pages
CS412 Computer Vision Final Exam
No ratings yet
CS412 Computer Vision Final Exam
5 pages
Wachemo University Math Worksheet
No ratings yet
Wachemo University Math Worksheet
2 pages
Supervised Machine Learning Overview
No ratings yet
Supervised Machine Learning Overview
30 pages
Heat and Thermodynamics Exam Answers
No ratings yet
Heat and Thermodynamics Exam Answers
3 pages
Schrödinger Equation in Quantum Mechanics
No ratings yet
Schrödinger Equation in Quantum Mechanics
2 pages
Machine Learning Mid Exam Questions
No ratings yet
Machine Learning Mid Exam Questions
5 pages
Delta Modulation Overview and Applications
No ratings yet
Delta Modulation Overview and Applications
12 pages
Data Privacy and Encryption Overview
No ratings yet
Data Privacy and Encryption Overview
37 pages
Supervised Learning Models Cheat Sheet
No ratings yet
Supervised Learning Models Cheat Sheet
3 pages

Deep Learning with RNNs for Time-Series

Uploaded by

Deep Learning with RNNs for Time-Series

Uploaded by

Introduction to Deep Learning

Recurrent Neural Networks for Sequential Data (Time Series)

MATH 370: Machine Learning

▪ Teasing apart these factors of variation is also an important problem.

• B is the one-dimensional backshift operator, 𝑥𝑡 is an ARMA(p, q) process that captures short-range

Figure: Fractional Differencing applied to DAX index

▪ A recurrent structure can be helpful if each input and/or output is a sequence

Pic source: [Link]

• It depicts 4 timesteps. The first is exclusively

• The second one is a mixture of the first and

• You should recognize that, in some way,

• Presumably, timestep 5 would have to choose

• This is very real. It's the notion of memory

• As you might expect, bigger layers can hold

• They learn by fully propagating

• You can also pretend that it's just a

• Other than that, it's normal

Black: Prediction, Yellow: Error, Orange: Gradients Pic source: [Link]

These green cells are some

▪ Different inputs or outputs need not have the same length

Each node Predicted

Encode the input sequence (embeddings of tokens)

“Seed” token, e.g, <START> 𝑠0

▪ Each generate word/token is fed to the next step’s hidden state

▪ Generation stops when an “end” token (e.g., <END>) is generated

▪ In theory, they should. In practice, they can’t. Some reasons

▪ Various extensions of RNNs have been proposed to address forgetting

Pic source: [Link]

Pic source: [Link]

Figure: Autocorrelation plot of traffic and DJI datasets

Embeddings that take information

Pic source: [Link]

Pic source: [Link]

Readings for you:

You might also like