Hybrid BiLSTM for Electricity Forecasting
Hybrid BiLSTM for Electricity Forecasting
Research Article
Electricity Load and Price Forecasting Using a Hybrid Method
Based Bidirectional Long Short-Term Memory with Attention
Mechanism Model
Received 24 October 2022; Revised 6 December 2022; Accepted 15 December 2022; Published 3 February 2023
Copyright © 2023 William Gomez et al. This is an open access article distributed under the Creative Commons Attribution
License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is
properly cited.
The accuracy of forecasting short- or medium-term electricity loads and prices is critical in the energy market. The amplitude and
duration of abnormally high prices and load spikes can be detrimental to retailers and production systems. Therefore, predicting
these spikes to effectively manage risk is critical. In this paper, a novel hybrid method that combines ensemble empirical mode
decomposition (EEMD) algorithm and a bidirectional long short-term memory with attention mechanism (BiLSTM-AM)
model is proposed to predict electricity loads and prices. A simple approach is proposed to determine the number of intrinsic
mode functions (IMFs) that decompose raw data using EEMD to avoid overdecomposition, irrelevant components, and high
computational cost. Each selected mode is then modeled with BiLSTM-AM to obtain a predicted sequence. These sequences
are summed and then reverted to obtain the final predicted value. The proposed method is validated using two datasets (PMJ
and Australian Energy Market Operator) with different time intervals to demonstrate the generality and robustness of the
forecasts, especially in temporal valley or peak forecasting. The results show that the proposed method outperforms other
methods in prediction accuracy and spike-capturing ability, with EEMD reducing the mean absolute percentage error (MAPE)
by 53%, 54%, and 60%, respectively. In the three forecast periods, the average MAPE and R 2 are 0.097 and 0.92, respectively.
̲
Furthermore, we use Kolmogorov-Smirnov predictive accuracy (KSPA) test and model confidence set (MCS) test to validate
the superiority of the proposed model. The results demonstrate its suitability, reliability, and performance in short- and
medium-term forecasting.
performance of a single technique-based predictive model is ple models to form these methods shows that they perform
insufficient due to inherent limitations. Energy prices are better than a single model. Decomposition methods include
difficult to predict due to rapid peaks and troughs caused frequency scale resolution using wavelet transform analysis
by changes in supply and demand due to intraday system [20] and empirical mode decomposition (EMD) [21] for
constraints [2]. nonlinear and nonstationary data. The task of combining
Therefore, a generalized forecasting model will help in models through decomposition algorithms such as EMD is
grid monitoring, optimal dispatch, and energy production the process of finding hyperparameters and the number of
and conversion control. Not much works has attempted to recommended intrinsic mode functions (IMFs). This pro-
model electricity load and price with the same model to cess involves an optimization analysis of the trade-off
examine its generality, flexibility, and reliability. Different between accuracy and computation time. Determining
time-range datasets were also tested to test its ability to han- spikes and peaks in the energy market is necessary for a wide
dle different variational frequency data and better capture range of decisions, such as risk management, transmission
valley-peak. We applied a simple technique to determine congestion, and outage monitoring, so we apply an ensemble
the number IMFs to avoid over-decomposition and irrele- empirical mode decomposition (EEMD) method.
vant components and reduce computation time. Highly unpredictable price spikes can occur due to many
High demand for electric vehicles and microgrids will complex factors affecting intraday grid conditions. We pres-
lead to increased energy distribution and frequent market ent a novel method based on ensemble empirical mode
surges. So, electricity load and price forecasting is becoming decomposition and bidirectional long short-term memory
more challenging. Much of the recent literature looks more with attention mechanism (BiLSTM-AM) to the electricity
at models that can improve accuracy and capture abnormal load and price forecasting problem. BiLSTM-AM can cap-
price situations. ture nonlinear and temporal trends in energy prices and
Over the years, machine learning (ML) models have demand. The BiLSTM-AM model is used to predict normal
made great contributions to high accuracy forecasting of economic situations and record price and load spikes above
electricity load and price forecasting [3]. The nonlinear a predetermined threshold. The proposed model is validated
learning capabilities and fault tolerance of these models using two electricity loads and one electricity price data. The
make them excellent decision-making tools for rapidly contributions of this paper are summarized as follows:
changing energy markets [4]. Lu et al. [5] combined three
machine learning methods, including least absolute shrink- (1) A Bayesian optimization (BO) algorithm based on
age and selection operator (LASSO) for features extraction, random forest regressors is used to tune the hyper-
principal component analysis (PCA) for dimensionality parameters of the proposed hybrid EEMD and
reduction, and Bayesian ridge regression (BRR) for predictive BiLSTM-AM model
load. Khan et al. [6] applied a hybrid model combining multi-
(2) The proposed model can be used to monitor electric-
layer perceptron, support vector regression (SVR), and
ity load and price
CatBoost. Bhinge et al. [7] developed a nonparametric model
based on a Gaussian process regression (GPR) approach to (3) Provide a simple and effective method to determine
predict energy consumption with a relative error of less than the number of IMFs, avoiding over-decomposition,
6% on test data. A gradient boosting regression trees (GBRT) irrelevant components, and high calculation costs
method was developed for 15-minute electric load forecast-
ing in time domain industry energy [8]. Zhao et al. [9] used The rest of this article is organized as follows. The liter-
a kernel extreme learning machine (KELM) model to ature related to the energy market forecasting models are
improve predictive performance. Tschora et al. [10] applied reviewed in Section 2. Section 3 describes the proposed
some machine learning models called ×AI to study the ability model. Section 4 provides the analysis results. Finally, we
of different machine learning techniques to accurately pre- make conclusions and further research directions.
dict electricity prices.
Recently, deep learning models have received more 2. Related Work
attention due to their ability to handle energy time-series
data’s nonlinear and chaotic properties. Several methods, Price and load spikes frequently occur in the energy market
including long short-term memory (LSTM) [11], hybrid that can significantly impact economic activity. Therefore,
methods based on autoencoders and variational mode predicting normal price and load fluctuations as well as
decomposition (VMD) [12], reinforcement learning [13], spike situations, helps gauge the risks to the economy asso-
support vector machine (SVM) [14], multiobjective optimi- ciated with changes in energy markets. In energy markets,
zation [15], and combining convolution, gated recurrent capturing spikes is vital as they are a core component of
unit (GRU), and LSTM [16] have been applied to short- market attention. The BiLSTM model, coupled with an
term prediction. attention mechanism, allows us to model load and price
Modeling data using attention mechanisms has achieved spikes and provide accurate predictions. A number of
great success in many areas of deep learning [17], recogni- methods in the recent literature have been used to address
tion of speech [18], and energy time series data [19]. Recur- price or load spikes. Singhal and Swarup [22] applied an
rent neural networks with attention mechanisms are more artificial neuron network (ANN) to capture normal, small,
accurate than traditional neural networks. Combining multi- and large price spikes to predict market-clearing prices
ijer, 2023, 1, Downloaded from [Link] by Bangladesh Hinari NPL, Wiley Online Library on [27/03/2025]. See the Terms and Conditions ([Link] on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
International Journal of Energy Research 3
(MCPs) in day-ahead energy markets. Clements et al. [23] [34] used EMD with ANN for wind speed prediction.
investigated the regional relationship of load spikes in Aus- Table 1 summarizes some previous related work on energy
tralia. Christensen et al. [24] used autoregressive conditional market forecasting.
hazard to handle spikes by treating prices as discrete-time
point processes. 2.1. LSTM and BiLSTM Models. The availability of data and
Several methods, such as similarity-day approach [13], computing power has made deep learning an integral part of
SVM [14], and ANN and fuzzy logic methods [25], have time series forecasting. In machine learning, an LSTM model
progressed in load forecasting to help overcome the limita- is a special type of recurrent neural network (RNN) devel-
tions of traditional forecasting models. Kiartzis et al. [26] oped by Hochreiter and Schmidhuber [35]. It quickly gained
proposed an expert system-based peak prediction model. popularity, particularly for solving time series prediction
Polson et al. [2] presented a combination of deep learning issues in many research areas. These networks were intro-
and extreme value theory (DL-EVT) to deal with sharp duced to eliminate long-term dependency and vanishing
peaks, troughs, and sudden price changes in energy markets gradient problems. In regular RNNs, the problem mainly
due to the fluctuations in demand and supply caused by occurs when previous information is connected with new
intraday constraints. Still, this approach fails to capture the information. As a result, LSTM units are used instead of hid-
ups and downs in the levels. Radovanovic et al. [1] devel- den layers to better process timing-related inputs. LSTM
oped a holistic approach to recovering many energy market units consist of forget gate ð f t Þ, input gate ðit Þ , cell state
ðC~ t Þ, a memory unit ðC t Þ, output gate ðOt Þ, and hidden
structures and forecasting node prices based solely on pub-
licly available data, particularly historical prices, system layer ðht Þ.
loads, and a grid-wide mix of generation types. Huang À Á
et al. [21] presented an adaptive signal time-frequency pro- f t = σ W f h ht−1 + W f x xt + b f , ð1Þ
cessing method called EMD, which is particularly suitable
for analyzing and processing nonlinear and nonstationary it = i1t ∗ i2t , ð2Þ
signals. Wang et al. [29] used the time-frequency spectrum
obtained from the EEMD to overcome the mode mixing where i1t = σðW ih ht−1 + W ix xt + bi Þ and i2t = tanh ðW gh
problem and more accurately reflect time series conditions. ht−1 + W gx xt + bg Þ.
Wu et al. [30] presented an improved EEMD with LSTM
for crude oil price forecasting. Zheng et al. [31] proposed ~ t = tanh ðW θh ht−1 + W θx xt + bθ Þ,
C ð3Þ
empirical mode decomposition and long short-term mem-
ory (EMD-LSTM) with the Xgboost algorithm for short- ~t,
C t = f t ∗ Ct−1 + it ∗ C ð4Þ
term load prediction. Regarding the hybrid methods,
Meng et al. [19] proposed a load prediction method based Ot = σðW oh ht−1 + W ox xt + bo Þ, ð5Þ
on EMD and long short-term memory with an attention
mechanism. Huang [32] proposed a hybrid deep neural ht = tanh Ct ∗ Ot : ð6Þ
network model for short-term electricity price prediction.
Bedi and Toshniwal [33] adopted a deep learning model These equations contain an offset term (b) and a
based on EEMD to predict crude oil prices. Liu et al. weight coefficient matrix (W ), a hyperbolic tangent
ijer, 2023, 1, Downloaded from [Link] by Bangladesh Hinari NPL, Wiley Online Library on [27/03/2025]. See the Terms and Conditions ([Link] on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
4 International Journal of Energy Research
activation function ( tanh), and a sigmoid activation func- denoted C i ðtÞ, satisfying the two conditions (i) and
tion (σ). (ii) mentioned above. hðtÞ = C i ðtÞ = IMF 1 ; If not,
The concept of BiLSTM can be defined as making a neu- consider hðtÞ as the new signal to perform (1) to
ral network that has information in both directions, back- (4). If hk ðtÞ does not meet the definition of IMF,
ward (from future to past) and forwards (from past to repeat k times, then extract IMF1: hk ðtÞ = C1 ðtÞ =
future) [36]. Compared to regular LSTM where input can IMF1 .
only flow in one direction, BiLSTMs can let input flow in
both directions [37]. The bidirectional recurrent neural net- (5) A new signal RðtÞ = xðtÞ − IMF1 can be obtained by
works (BRNNs) concept is simple to grasp; backward subtracting IMF1 from original signal xðtÞ. the above
LSTMs are calculated similarly to forward LSTMs, except steps 1 to 4 are repeated to get IMF 1 .
that the time direction is reversed. The forward layer, the (6) Repeat steps 1-5 by using RðtÞ as the input series
backward layer and the hidden layer’s final output are until the termination condition is met, where RðtÞ
satisfies monotonicity; then, EMD decomposition
h⟶ = f ðw⟶1 xt + w⟶2 ht−1 Þ, of the original signal ends
h⟵ = f ðw⟵1 xt + w⟵2 ht−1 Þ, ð7Þ Using the number of IMFs = n, the original signal xðtÞ is
yi = gðwo1 ∗ h⟶ + wo2 ∗ h⟵ + bo Þ, reconstructed as follows:
put, and yi is the final output of LSTM. It is the concatena- xðt Þ = 〠 IMF i ðt Þ + Rðt Þ, ð8Þ
i=1
tion of the two hidden states h⟶ and h⟵ that determines
the final output yt of the BiLSTM.
In this study, the BiLSTM-AM model is used to capture where RðtÞ is the last residue:
spike situations in the energy market, the periodicity in the To overcome the mode mixing problem of the EMD
data, and past and future effects on current data. method, Wu and Huang [38] suggested a noise-assisted
EMD algorithm called ensemble empirical mode decompo-
2.2. Decomposition Algorithms. EMD is an adaptive time- sition (EEMD). EEMD is an improvement on the tradi-
space analytic approach for nonstationary and nonlinear tional EMD algorithm. It introduces some noise into the
series processing. EMD executes operations that partition a process so that the IMFs produced on the other end
series into IMFs without leaving the time domain. Huang should be more accurate. Unlike the standard EMD
et al. [21] proposed the EMD method, which divides the method, this method allows a better scale separation. With
complex original signal into a sequence of IMFs with varying EEMD, white noise is added to the original signal accord-
amplitudes and a residual. In this case, the series is decom- ing to the EMD algorithm. The signal is evenly distributed
posed into a finite number of IMFs. Since numerous eco- at the extreme point interval throughout the band, reduc-
nomic and social factors cause a change in energy market ing mode mixing effects.
price and demand, they show various characteristics of the EEMD algorithm [39] involves the following steps:
original signal at different time scales. The term IMF refers
to a monocomponent function with one instantaneous fre- (1) Add white noise εi ðtÞ to the original signal xðtÞ with
quency that must satisfy the two conditions as follows: a standard normal distribution Nð0, 1Þ. Y i ðtÞ = xðtÞ
+ εi ðtÞ, for i = 1, 2,:,:,:,N, where N is the number of
(i) The local extreme value points and zero-crossing ensembles
points must be equal, or at least one difference, dur-
ing the whole period (2) The EMD algorithm decomposed the transformed
signal Y i ðtÞ into a set of IMFs and a residual
(ii) There is a zero-mean value for the envelopes defined
by the upper and lower envelope at every point in
M−1
the time series ðiÞ
Y n ðt Þ = 〠 IMF ðmiÞ ðt Þ + r M ðt Þ, ð9Þ
m=1
The EMD decomposition steps are as follows:
(1) All maxima and minima in the time series xðtÞ are where M − 1 is the total number of IMFs produced.
calculated using cubic spline interpolation ðiÞ
IMF ðiÞ th
m is the m IMF and r M the residual at i trial
th
(2) Do a local mean value calculation for the upper and (3) Steps (1) and (2) are repeated for N trials. A different
lower envelopes mðtÞ = ðUðtÞ + LðtÞÞ/2. white noise series εi ðtÞ is added to the original signal
(3) mðtÞ is subtracted from the original time series xðtÞ in each trial
to obtain the intermediate signal hðtÞ = xðtÞ − mðtÞ. (4) The last IMF of the EEMD algorithm ðIMFave m Þ is
(4) Input series hðtÞ are repeated until the mean of them computed by averaging the total m IMF in relation
N i
approaches zero, and the ith IMF is computed, m ðtÞ = 1/N∑i=1 IMFm ðtÞ, therefore
to N trials: IMF ave
ijer, 2023, 1, Downloaded from [Link] by Bangladesh Hinari NPL, Wiley Online Library on [27/03/2025]. See the Terms and Conditions ([Link] on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
International Journal of Energy Research 5
Attention
layer
Backward
LSTM LSTM LSTM LSTM LSTM
layer
BiLSTM Layer
Forward LSTM
LSTM LSTM LSTM LSTM
layer
IMF 1 = 1/N∑Ni=1 IMFi1 ðtÞ and Residual r 1 ðtÞ = xðtÞ more accurate by assigning different probability weights to
− IMF1 inputs to highlight more important factors. The normaliza-
tion term in the RNN keeps all summed inputs of the rescal-
The results produced by EEMD depend on the number ing layer constant, making the hidden-to-hidden dynamics
of ensemble N and the added noise β which should satisfy more stable [17–18].
the following relation [18]. The many-to-one attention mechanism is calculated as
follows:
β À À ÁÁ
α= , ð10Þ
√N exp Score h f , hb
Attention weight : αfb = À À ÁÁ :
∑ni=1 exp Score h f , hb ′
where α is the final standard deviation of the residual.
Attention vector : πt = σðC t ht Þ = vtanhðWht + Wht + bÞ,
3. Proposed Method ð11Þ
We propose a novel deep learning framework based on inte-
where the
grated ensemble empirical mode decomposition and bidirec-
tional long short-term memory with attention to forecasting n
energy market data. The purpose of combining the attention Context vector : C = 〠 α f b hi , ð12Þ
mechanism with BiLSTM is to capture the influential part of i=1
the input sequence during the prediction process. With the
introduction of the attention mechanism, BiLSTM becomes
8
>
< h T
Wh Luong ′ s multiplicative style
À Á f b
Score h f , hb = : ð13Þ
: vT vtanhÀW h + W h Á
>
Bahdanan ′ s additive style
α 1 f 2 b
ijer, 2023, 1, Downloaded from [Link] by Bangladesh Hinari NPL, Wiley Online Library on [27/03/2025]. See the Terms and Conditions ([Link] on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
6 International Journal of Energy Research
Normalized
data
IMFs Residual
n+1
LSTM LSTM
Compute
RMSEn and RMSEn+1
No
Trained model
Bayesian optimization (BiLSTM-AM) with
sliding window approach
Output predicted
values
(load/price)
The hidden layer state value of the final output is calcu- decompose using EEMD method prior to modeling. The
lated as H t′ = GðC, ht , xt Þ. normalization formula is
As shown in Figure 1, the proposed BiLSTM-AM model
consists of five layers: input vector, forward LSTM hidden
layer, backward LSTM hidden layer, attention layer, fully X − X min
Xn = , ð14Þ
connected layer, and output layer. The input sequence data X max − X min
is passed to a forward LSTM hidden layer and a backward
LSTM hidden layer, which combine to output a processed
vector. The attention layer computes a weight vector based where the normalized data is X n , the overall sample mini-
on the LSTM model and then combines the weight vector mum value is X min , and the overall sample maximum value
with the shallow output to compute the predictions of the is X max . After making predictions using EEMD-BiLSTM-
fully connected layer. The structural architecture of the pro- AM, the predicted values from the sum of IMFs predicted
posed BiLSTM-AM is shown in Figure 1. values are restored to the original values using the following
We transform the data to the range between [0,1] using equation;
the min-max normalization method. Due to the noisy nature
of time series data, to reduce the noise impact of the col-
lected data for better forecasting, we normalize and then Y t = X t′ðX max − X min Þ + X min , ð15Þ
ijer, 2023, 1, Downloaded from [Link] by Bangladesh Hinari NPL, Wiley Online Library on [27/03/2025]. See the Terms and Conditions ([Link] on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
International Journal of Energy Research 7
60000
55000
50000
45000
Load (MWh)
40000
35000
30000
25000
20000
15000
12/8/2015 6/8/2016 12/8/2016 6/8/2017 12/8/2017 6/8/2018
Time (hourly)
Load (MWh)
(a)
600
500
Price ($/MWh)
400
300
200
100
14400
Price ($/MWh)
9600
4800
13500
Load (MWh)
10800
8100
5400
Price ($/MWh)
Load (MWh)
(a)
100
9500
9000 80
8500
Price ($/MWh)
60
Load (MWh)
8000
7500
40
7000
6500 20
6000
0
5500
Load (MWh)
Price ($/MWh)
(b)
where X t′ is the forecast value from the proposed model and ation process is stopped and the optimal number of
Y t is the final reconstructed prediction value. IMFs is obtained
Figure 2 provides an illustration of the proposed method. sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
The proposed method consists of four parts: 1 N
RMSE = 〠 ðy − f Þ2 , ð16Þ
N t=1 t t
(1) The data is split into training and test sets, and pre-
processed the training data using Eq. (14). Then, we
apply the EEMD to decompose the data where N is the total number of predicted values, yt
represents the actual value, and f t is the predicted
(2) Since the number of mode components is difficult to value. Algorithm 1 gives the optimal IMF number
determine, much research has been done on how to selection process.
determine the number of components to decompose (3) We model each sequence by using the BiLSTM-AM
from the original data. A simple approach based on model. The predicted subseries are sum and we use
the model evaluation method is proposed to deter- Equation (15) for reverse normalization to obtain
mine the number of mode components. The iteration the predicted value
process applies the LSTM model to compare the root
mean square error (RMSE) of the mode components (4) The trained BiLSTM-AM model with the sliding
(MCn ) and MCn+1 by Eq. (16). When the RMSEn+1 window approach is used to obtain the final pre-
of MC n+1 is greater than the RMSEn of MC n , the iter- dicted value for the test data
ijer, 2023, 1, Downloaded from [Link] by Bangladesh Hinari NPL, Wiley Online Library on [27/03/2025]. See the Terms and Conditions ([Link] on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
International Journal of Energy Research 9
(a)
(b)
(c)
Table 4: Optimal hyperparameters of the proposed model for The second dataset is from the Australian Energy Market
NSW 2 days prediction. Operator (AEMO) [40], which is aggregated half-hourly
demand and price data. Australia’s AEMO manages electric-
Parameters IMF1 IMF2 IMF3 IMF4 Residual ity and gas networks in five states. We use the load data from
Num_layers 2 1 1 3 1 New South Wales (NSW) for our predictive analysis. Data
Neuros [128,32] 256 430 [200,64,16] 300 range from January 2020 to August 2021. Figure 4(a) shows
Dropout 0.01 0 0.1 0 0.04 the load plot for NSW half-hourly data, and Figure 4(b)
Optimizer Adam Adam Adam Adam Adam compares the time-series relationship between price and
load for the first ten days from January 1 to January 10,
Learning_rate 0.0001 0.001 0.00201 0.000742 0.005
2021. Both the NSW and PJM interconnection load dia-
grams depict relatively homogeneous and periodic charac-
4. Datasets teristics, while the price graphs show non-homogeneous
and rarely periodic characteristics.
We use two datasets of hourly and half-hourly electricity Table 2 shows the summary statistics of the three data-
loads and prices for model validation. The hourly dataset is sets. The skewness values of all variables are greater than 0,
from PJM [39]. The price of electricity in the real-time indicating a right-skewed effect.
(RT) market reflects the actual cost of supplying electricity
to a location, given the actual constraints of the grid. PJM 5. Analysis Results
uses hourly RT energy market to be locational marginal pric-
ing (LMP). Wholesale electricity prices are based on LMP, Using our proposed method in Figure 2, the optimal number
loads, generation patterns, and physical constraints on the of IMFs for PJM load under 2, 10, and 30-day forecast
transmission system, so they reflect the value of electricity periods are 3, 4, and 4, respectively (see Table 3). For the
in different regions. The raw data for daily-ahead (DA) 2-day forecast of PJM load, the RMSEs of different numbers
and RT price from January 1, 2020 to June 30, 2021. of IMFs, such as 1, 2, 3, and 4, are 323.44, 290.51, 254.01,
Figure 3(a) shows a plot of load data in megawatt-hour and 267.87, respectively. It shows that the number of IMFs
(MWh). Figure 3(b) shows the RT price diagram. = 3 for the PJM load provides the smallest RMSE. Therefore,
ijer, 2023, 1, Downloaded from [Link] by Bangladesh Hinari NPL, Wiley Online Library on [27/03/2025]. See the Terms and Conditions ([Link] on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
10 International Journal of Energy Research
the optimal number of IMFs for PJM load under 2-day pre- meters based on the training data set. The $100/MWh
diction using the LSTM model is 3. The optimal number of threshold is generally considered to be the threshold for
IMFs for PJM price under 2, 10, and 30-day forecast periods exceeding an extreme price event in Australia, so a spike is
are 5, 3, and 3, respectively. defined as Pt ≥ $100 MWh [23].
For a fair comparison, all models are hyperparameter The performance of our proposed model and bench-
tuned to achieve their best predicted values. Hyperpara- mark models is evaluated by using RMSE: mean absolute
meters of the proposed model and other models used for percentage error (MAPE), R2 : mean absolute error (MAE)
comparison were tuned using a Bayesian optimization and mean square error (MSE), where MAPE, R2 , MAE,
(BO) algorithm based on a random forest regressor as a sur- and MSE are defined as
rogate model. Table 4 describes the optimal hyperpara-
meters for two days predictive analysis. N + 1 means the !
N
number of IMFs = N, and 1 is the residual. 1 y −f
MAPE = 〠 t t x 100%,
For predictive analysis, the data is first normalized by N t=1 yt
Equation (14) and then split into training and test data.
A multistep prediction approach with a sliding window ∑Nt=1 ðyi − f t Þ2
R2 = 1 − ,
is used for the prediction. For example, we take a 10-day ∑Nt=1 ðyi − yÞ2 ð17Þ
forecast: PJM price (Jan./1/2020–Jun./20/2021) as training
data and (Jun./21/2021–Jun./30/2021) as test data; and ∑Nt=1 jy t − y t j
MAE = ,
NSW load (Jan./1/2020 at 0 : 30 am–20/8/2021 at 7 : 00 am) N
as training data and (Aug./20/2021 at 7 : 30 am–Aug./30/
∑Nt=1 ðyt − yt Þ2
2021 at 23 : 30 pm) as test data. The same segmentation pro- MSE = ,
cedure as above was applied for other prediction periods, N
with the length of the prediction period as the test set and
the rest as the training set. where N is the total number of predicted values, yt repre-
All DL models use 10% of the training set as a validation sents the actual electricity price or load, f t is the predicted
set to evaluate the model’s fitness while tuning its hyperpara- value, y is the mean of all actual values, and R2 is the
ijer, 2023, 1, Downloaded from [Link] by Bangladesh Hinari NPL, Wiley Online Library on [27/03/2025]. See the Terms and Conditions ([Link] on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
International Journal of Energy Research 11
10500
10000
9500
Actual
ARIMA
9000 GBRT
SVR
ELM
8500 BP
Actual
LSTM
BiLSTM EMD-BiLSTM-AM
BiLSTM-AM
8000 EMD-BiLSTM-AM
EEMD-BiLSTM-AM
EEMD-BiLSTM-AM
(c)
30 32 34 36 38 40 42
(b)
11000
Actual
ARIMA
GBRT
10000 SVR
ELM
Load (MWh)
BP
9000 LSTM
BiLSTM
BiLSTM-AM
8000 EMD-BiLSTM-AM
EEMD-BiLSTM-AM
7000
6000
0 20 40 60 80 100
Time (half-hourly)
(a)
Table 7: Peak points and time difference analysis of 2-day prediction for NSW load.
Models Actual peak load Predicted peak Load difference Actual peak time Predicted peak time Time difference
SVR 10,151.67 10,529.31 -377.64 8/29/21 19 : 00 8/29/21 19 : 30 30 min
ARIMA 10,151.67 10,447.81 -296.139 8/29/21 19 : 00 8/29/21 20 : 00 60 min
GBRT 10,151.67 10190.85 -39.1778 8/29/21 19 : 00 8/29/21 19 : 00 —
ELM 10,151.67 10,448.04 -296.37 8/29/21 19 : 00 8/29/21 19 : 30 30 min
BP 10,151.67 10,501.46 -349.79 8/29/21 19 : 00 8/29/21 19 : 00 —
LSTM 10,151.67 10,393.57 -241.90 8/29/21 19 : 00 8/29/21 19 : 00 —
BiLSTM 10,151.67 10,370.99 -219.32 8/29/21 19 : 00 8/29/21 19 : 00 —
BiLSTM-AM 10,151.67 10,341.87 -190.20 8/29/21 19 : 00 8/29/21 19 : 00 —
EMD-BiLSTM-AM 10,151.67 10,071.37 80.30 8/29/21 19 : 00 8/29/21 19 : 00 —
EEMD-BiLSTM-AM 10,151.67 10,149.69 1.98 8/29/21 19 : 00 8/29/21 19 : 00 —
coefficient of determination. Different evaluation measures We compared the proposed model with nine other
tend to produce different results in predicting effects. There- models in terms of prediction accuracy and ability to capture
fore, we also apply the model confidence set (MCS) test [41] peaks. The comparison models include four machine learn-
and Kolmogorov-Smirnov predictive accuracy (KSPA) test ing models (support vector regression (SVR), extreme learn-
[42] to compare the predictive ability of each model. ing machine (ELM), gradient boosting regression tree
To validate the proposed model and EMD-BiLSTM-AM, (GBRT), and back-propagation neuron network (BPNN)),
the predicted subsequences are summed using an ensemble three deep learning models (LSTM, BiLSTM, and BiLSTM-
summation strategy, and all predicted values are renorma- AM), one classical autoregressive integrated moving average
lized to the original data scale. The spike threshold for (ARIMA) model, and a hybrid model (EMD-BiLSTM-AM).
PJM price was set to $100 per MWh, the NSW load peak The training lengths for the three forecast periods (2, 10, and
was set to 11,000 MWh, and the PJM load peak was set to 30 days) for PJM prices are 13,074 hours, 12,882 hours, and
39,000 MWh, as shown in Figures 3 and 4. 12,402 hours, respectively.
ijer, 2023, 1, Downloaded from [Link] by Bangladesh Hinari NPL, Wiley Online Library on [27/03/2025]. See the Terms and Conditions ([Link] on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
12 International Journal of Energy Research
140
120
Price ($/MWh)
100 Threshold level
80
60
40
20
140
120
100
80
60
40
20
0
180 185 190 195 200
(b)
Actual LSTM
ARIMA BiLSTM
GBRT BiLSTM-AM
SVR EMD-BiLSTM-AM
ELM EEMD-BiLSTM-AM
BP
Table 8: The results of different models for PJM price under different forecast periods.
Python is used to implement all analyses of the model. improve predictive performance, especially on high-
TensorFlow is installed in the Spyder 3.7.3 environment. frequency data, as shown in Figure 5 for the 30-minute
Bayesian optimization (BO) based on random forest regres- interval data for NSW.
sion was used to tune the hyperparameters of all other Table 7 shows each model’s peak time and amplitude
models except the ELM model with default parameters analysis in Figure 5(b). The table shows that the proposed
implemented using the scikit-learn neural network package model has the closest amplitude to the actual peak and cap-
and automatic ARIMA. The proposed model and other DL tures the exact time of the peak. All other models captured
models used for comparison apply a “ReLU” activation func- the exact time index of the peak except SVR, ARIMA, and
tion, the loss function is the mean square error (MSE), ELM but failed to capture the amplitude. The proposed
Adam as the optimizer, a lookback = 6, epochs = 100, and model underestimates the load peak by only 1.98 MW,
batch size = 4 for the load dataset, 3 batch size for the price which shows the prediction power of the proposed model
dataset. All experiments are performed on a desktop with in capturing the peak and spike.
processors: Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz As shown in Figure 6, the spike threshold level is set at
and NVIDIA GeForce GTX 1660 Ti GPU; OS: Windows $100/MWh, and the 10-day forecast from June 21st-30th,
x64; Memory: 16.00 GiB. 2021 shows a one-time spike situation as on June 29th,
Tables 5 and 6 show the prediction accuracy of the eight 2021. ELM, EMD-BiLSTM-AM, SVR, and the proposed
models under different forecast periods in terms of MAPE, model can capture the spikes. LSTM also captures the spikes,
RMSE, and R2 for PJM and NSW loads. The proposed but not in amplitude compared to others, while the rest fail
model has a MAPE of less than 1% in the three forecast to capture the spikes. Compared with the other models in
periods. The proposed model has the lowest RMSE for all Table 8, the proposed model outperforms the comparative
different forecast periods. Using the R2 metric, the proposed models in all different forecast periods and better captures
model can achieve 99.6% to 99.8% accuracy in the three fore- the peak time and amplitude, as shown in Figure 6.
cast periods. The results show that the proposed model Although the EMD-BiLSTM-AM model captures the same
achieves the best results on benchmarks for all the prediction peak level as the proposed model, it still has lower prediction
periods, followed by other deep learning models. All models accuracy than the proposed model.
can capture peaks in load data, while some cannot capture On average, the MAPE and R2 of the proposed model for
price spikes. Regardless of the time interval difference, the three test periods were 0.097 and 0.920, respectively.
EEMD-BiLSTM-AM accurately predicts the actual electricity EEMD-BiLSTM-AM outperforms other models in all per-
load while better capturing peaks. Therefore, the proposed formance metrics used. We can see in Table 8 that using
model can predict the electrical load at different time intervals. EEMD-BiLSTM-AM improves the MAPE of the prediction
On the MAPE metric for NSW load data, applying the results by 50%, 17%, and 37% compared to EMD with
EEMD method to BiLSTM-AM can improve the prediction BiLSTM-AM in 2-day, 10-day, and 30-day predictions,
results by 53%, 54%, and 60% for 2-day, 10-day and 30-day respectively.
forecasts, compared to using BiLSTM. Figure 5 shows the 2- From the above analysis, we conclude that the proposed
day test set load forecast for NSW with good accuracy for the method can predict data with different fluctuation frequen-
spike and peak capture levels for the proposed model. cies, time intervals, and range, i.e. load and price data types.
Machine learning models and classical statistical model Figures 5 and 6 compare the prediction accuracy of nine
ARIMA perform poorly in valley-peak prediction as com- benchmark models and our proposed model on different
pared to the hybrid models. SVR and ELM also miss the data types and different time horizons. EEMD-BiLSTM-
peak time predictions. This suggests that hybrid deep learn- AM shows higher forecasting performance and less peak-
ing models combined with decomposition methods can help to-valley amplitude.
ijer, 2023, 1, Downloaded from [Link] by Bangladesh Hinari NPL, Wiley Online Library on [27/03/2025]. See the Terms and Conditions ([Link] on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
14 International Journal of Energy Research
30
25
30 30
30
30
25 25 20 25
20
20 20
20
15
Frequency
Frequency
Frequency
Frequency
Frequency
15
15 15 15
10
10
10 10 10
5
5 5 5 5
0 0 0 0 0
35 30
20
30
25
30 25
25
20 15
25
20
20
20
Frequency
Frequency
Frequency
Frequency
Frequency
15
15 10
15
15
10
10
10
10
5
5 5 5
5
0 0 0 0 0
Figure 7: Absolute error distribution plots for 2 days NSW forecast data.
Table 9 provides the computation times in Python, other models in terms of prediction performance and
including hyperparameter tuning, model training, and pre- valley-peak capture.
diction. Hybrid deep learning models take longer to com- The MCS test is used to select the best model with a
pute than deep learning and machine learning models. given level of confidence [42]. The KSPA test is a nonpara-
However, the hybrid deep learning models outperform metric test based on the Kolmogorov-Smirnov (KS) test
ijer, 2023, 1, Downloaded from [Link] by Bangladesh Hinari NPL, Wiley Online Library on [27/03/2025]. See the Terms and Conditions ([Link] on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
International Journal of Energy Research 15
1.0
0.8
Fn (x) 0.6
0.4
0.2
0.0
1.0
0.8
0.6
Fn (x)
0.4
0.2
0.0
0 20 40 60
x
Table 10: P-values of MCS test using different models for NSW load and PJM price.
[43]. This test was designed to determine whether the fore- data with different time horizons and different volatility pat-
cast error distributions were statistically significantly differ- terns, we examine the predictive performance, efficiency,
ent and to distinguish between more predictive and less and consistency of the proposed model by forecasting the
predictive models. short to medium term. The proposed method is compared
We provide statistical test analysis results of 2 days NSW with night benchmark methods, including four machine
load and 30 days PJM price forecast data. The KSPA error learning models (SVR, ELM, GBRT, and BPNN), three deep
distribution and empirical cumulative distribution function learning models (LSTM, BiLSTM, and BiLSTM-AM), a clas-
are shown in Figures 7 and 8 and, respectively. The two- sical ARIMA model, and a hybrid method (EMD-BiLSTM-
sided (p value) of the KSPA test for all compared models AM). The results show that the proposed method outper-
was less than 0.01 for the 2 days NSW load data. This con- forms other models in terms of prediction accuracy and
firms the statistical difference between the proposed and spike capture ability, with EEMD reducing the mean abso-
competing models and shows that the proposed model out- lute percentage error (MAPE) by 53%, 54%, and 60%,
performs the others. One-sided KSPA tests are also per- respectively. Over the three forecast periods, the average
formed to show the low errors of the proposed model and MAPE and R2 are 0.097 and 0.92, respectively. Furthermore,
its maximum prediction accuracy. The calculated statistical the analytical results of MCS and KSPA tests show that the
errors of the predictions show that the proposed model out- EEMD-BiLSTM-AM model has the best predictive ability.
performs the comparison models. The random error devia- The analysis results show that the price data exhibits
tion is better explained by the proposed EEMD-BiLSTM- large extreme volatility and uncertainty in peak behavior.
AM model, which shows the smallest error and the best pre- Its heterogeneity and many seasonality make modeling more
diction performance. Our proposed model better captures challenging compared to loads.
random bias with minimum error. The p value results of
MCS test are shown in Table 10. We set a threshold of Data Availability
0.25 for the p value to indicate that the model survived under
the loss function, which is underlined. A larger p value The data that support the findings of this study are available
means a more accurate model. According to the above anal- from the corresponding author upon reasonable request.
ysis, the prediction ability of the EEMD-BiLSTM-AM model
is the best. Disclosure
6. Conclusions The manuscript has not been previously published and is
not currently submitted for review to any other journals.
Energy market participants have always been nervous about The manuscript is approved by all authors and tacitly or
the economic consequences of soaring electricity prices. explicitly by the responsible authorities where the work
Many forecasting models in the energy market focus on pre- was carried out. Submission also implies that, if accepted,
dicting normal price and load movements without dealing it will not be published elsewhere in the same form, in
with extreme load or price situations in the market. In this English or in any other language, without the written con-
paper, we combine EEMD and BiLSTM-AM to predict elec- sent of the publisher.
tricity loads and prices while capturing spikes and peak
points. The EEMD algorithm decomposes the data to reduce Conflicts of Interest
the effect of noise. We then applied the BiLSTM-AM model
to obtain the occurrence and normal fluctuation values of The author(s) declare(s) that they have no conflicts of
price and load peaks from the decomposed data. Using three interest.
ijer, 2023, 1, Downloaded from [Link] by Bangladesh Hinari NPL, Wiley Online Library on [27/03/2025]. See the Terms and Conditions ([Link] on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
International Journal of Energy Research 17
Acknowledgments [16] H. Li, H. Liu, H. Ji, S. Zhang, and P. Li, “Ultra-short-term load
demand forecast model framework based on deep learning,”
This work was supported by the Ministry of Science and Tech- Energies, vol. 13, no. 18, p. 4900, 2020.
nology, Taiwan [Grant MOST-109-2221-E011-098-MY3]. [17] J. Du, “power load forecasting using BiLSTM-Attention,” Envi-
ronmental Sciences, vol. 440, no. 3, article 032115, 2020.
[18] G. Hinton, L. Deng, D. Yu et al., “Deep neural networks for
References acoustic modeling in speech recognition: the shared views of
four research groups,” IEEE Signal Processing Magazine,
[1] A. Radovanovic, T. Nesti, and B. Chen, “A holistic approach to vol. 29, no. 6, pp. 82–97, 2012.
forecasting wholesale energy market prices,” IEEE Transac-
tions on Power Apparatus and Systems, vol. 34, no. 6, [19] Z. Meng, Y. Xie, and J. Sun, “Short-term load forecasting using
pp. 4317–4328, 2019. neural attention model based on EMD,” Electrical Engineering,
vol. 104, pp. 1857–1866, 2022.
[2] M. Polson and V. Sokolov, “Deep learning for energy mar-
kets,” Applied Stochastic Models in Business and Industry, [20] I. Daubechies, J. Lu, and H. T. Wu, “Synchrosqueezed wavelet
vol. 36, no. 1, pp. 195–209, 2020. transforms: an empirical mode decomposition-like tool,”
Applied and Computational Harmonic Analysis, vol. 30,
[3] H. S. Hippert, C. E. Pedreira, and R. C. Souza, “Neural net-
no. 2, pp. 243–261, 2011.
works for short-term load forecasting: a review and evalua-
tion,” IEEE Transactions on Power Apparatus and Systems, [21] N. E. Huang, Z. Shen, S. R. Long et al., “The empirical mode
vol. 16, no. 1, pp. 44–55, 2001. decomposition and the Hilbert spectrum for nonlinear and
[4] H. Wang, Z. Lei, X. Zhang, B. Zhou, and J. Peng, “A review of non-stationary time series analysis,” Proceedings of the Royal
deep learning for renewable energy forecasting,” Energy Con- Society of London. Series A: mathematical, physical and engi-
version and Management, vol. 198, article 111799, 2019. neering sciences, vol. 454, no. 1971, pp. 903–995, 1998.
[5] D. Lu, D. Zhao, and Z. Li, “Short-term nodal load forecasting [22] D. Singhal and K. S. Swarup, “Electricity price forecasting
based on machine learning techniques,” International Trans- using artificial neural networks,” International Journal of Elec-
actions on Electrical Energy Systems, vol. 31, no. 9, p. 31, 2021. trical Power & Energy Systems, vol. 33, no. 3, pp. 550–555,
2011.
[6] P. W. Khan, Y.-C. Byun, S.-J. Lee, D.-H. Kang, J.-Y. Kang, and
H.-S. Park, “Machine learning-based approach to predict [23] A. E. Clements, R. Herrera, and A. S. Hurn, “Modelling inter-
energy consumption of renewable and nonrenewable power regional links in electricity price spikes,” Energy Economics,
sources,” Energies, vol. 13, no. 18, p. 4870, 2020. vol. 51, pp. 383–393, 2015.
[7] R. Bhinge, J. Park, K. H. Law, D. A. Dornfeld, M. Helu, and [24] T. M. Christensen, A. S. Hurn, and K. A. Lindsay, “Forecasting
S. Rachuri, “Towards a generalized energy prediction model spikes in electricity prices,” International Journal of Forecast-
for machine tools,” Journal of Manufacturing Science and ing, vol. 28, no. 2, pp. 400–411, 2012.
Engineering, vol. 139, no. 4, p. 139, 2017. [25] A. Badri, Z. Ameli, and A. M. Birjandi, “Application of artifi-
[8] J. Walther, D. Spanier, N. Panten, and E. Abele, “Very short- cial neural networks and fuzzy logic methods for short term
term load forecasting on factory level - a machine learning load forecasting,” Energy Procedia, vol. 14, pp. 1883–1888,
approach,” Procedia CIRP, vol. 80, pp. 705–710, 2019. 2012.
[9] X. Zhao, J. Wang, T. Zhang, D. Cui, G. Li, and M. Zhou, “A [26] S. J. Kiartzis, A. G. Bakirtzis, J. B. Theocharis, and G. Tsagas,
novel short-term load forecasting approach based on kernel “A fuzzy expert system for peak load forecasting: application
extreme learning machine: a provincial case in China,” IET to the Greek power system,” in 2000 10th Mediterranean Elec-
Renewable Power Generation, vol. 16, no. 12, pp. 2658–2666, trotechnical Conference. Information Technology and Electro-
2022. technology for the Mediterranean Countries. Proceedings.
[10] L. Tschora, E. Pierre, M. Plantevit, and C. Robardet, “Electric- MeleCon 2000 (Cat. No.00CH37099), pp. 1097–1100, Lemesos,
ity price forecasting on the day-ahead market using machine Cyprus, 2000.
learning,” Applied Energy, vol. 313, article 118752, 2022. [27] J. P. S. Catalão, S. J. P. S. Mariano, V. M. F. Mendes, and L. A.
[11] G. Sun, C. Jiang, X. Wang, and X. Yang, “Short-term building F. M. Ferreira, “Short-term electricity prices forecasting in a
load forecast based on a data-mining feature selection and competitive market: a neural network approach,” Electric
LSTM-RNN method,” IEEJ Transactions on Electrical and Power Systems Research, vol. 77, no. 10, pp. 1297–1304, 2007.
Electronic Engineering, vol. 15, no. 7, pp. 1002–1010, 2020. [28] Y. Y. Hong and C. Y. Hsiao, “Locational marginal price fore-
[12] J. Bedi and D. Toshniwal, “Energy load time-series forecast casting in deregulated electric markets using a recurrent neural
using decomposition and autoencoder integrated memory net- network,” in 2001 IEEE Power Engineering Society Winter
work,” Applied Soft Computing, vol. 93, article 106390, 2020. Meeting. Conference Proceedings (Cat. No.01CH37194),
[13] R. J. Park, K. B. Song, and B. S. Kwon, “Short-term load fore- pp. 539–544, Columbus, OH, USA, 2001.
casting algorithm using a similar day selection method based [29] T. Wang, M. Zhang, Q. Yu, and H. Zhang, “Comparing the
on reinforcement learning,” Energies, vol. 13, no. 10, p. 2640, applications of EMD and EEMD on time-frequency analysis
2020. of seismic signal,” Journal of Applied Geophysics, vol. 83,
[14] M. Mohandes, “Support vector machines for short-term elec- pp. 29–34, 2012.
trical load forecasting,” International Journal of Energy [30] Y. X. Wu, Q. B. Wu, and J. Q. Zhu, “Improved EEMD-based
Research, vol. 26, no. 4, pp. 335–345, 2002. crude oil price forecasting using LSTM networks,” Physica A:
[15] L. Xiao, W. Shao, C. Wang, K. Zhang, and H. Lu, “Research Statistical Mechanics and its Applications, vol. 516, pp. 114–
and application of a hybrid model based on multi-objective 124, 2019.
optimization for electrical load forecasting,” Applied Energy, [31] H. Zheng, J. Yuan, and L. Chen, “Short-term load forecasting
vol. 180, pp. 213–233, 2016. using EMD-LSTM neural networks with a xgboost algorithm
ijer, 2023, 1, Downloaded from [Link] by Bangladesh Hinari NPL, Wiley Online Library on [27/03/2025]. See the Terms and Conditions ([Link] on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
18 International Journal of Energy Research