0% found this document useful (0 votes)
9 views11 pages

PDF 3

The document presents TEASER, a novel algorithm for early time series classification (eTSC) that addresses the challenge of classifying time series after minimal measurements while maintaining high accuracy. TEASER employs a two-tier classification approach, where a slave classifier periodically assesses incoming data and a master classifier determines the reliability of the predictions, allowing for earlier decisions compared to existing methods. Evaluations demonstrate that TEASER achieves predictions two to three times earlier than competitors while achieving comparable or superior accuracy across various datasets.

Uploaded by

sexato3521
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views11 pages

PDF 3

The document presents TEASER, a novel algorithm for early time series classification (eTSC) that addresses the challenge of classifying time series after minimal measurements while maintaining high accuracy. TEASER employs a two-tier classification approach, where a slave classifier periodically assesses incoming data and a master classifier determines the reliability of the predictions, allowing for earlier decisions compared to existing methods. Evaluations demonstrate that TEASER achieves predictions two to three times earlier than competitors while achieving comparable or superior accuracy across various datasets.

Uploaded by

sexato3521
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

TEASER: Early and Accurate Time Series Classification

Patrick Schäfer Ulf Leser


Humboldt University of Berlin, Germany Humboldt University of Berlin, Germany
[Link]@[Link] leser@[Link]
Traces of Microwaves (acsf1)
ABSTRACT
Early time series classification (eTSC) is the problem of classifying
a time series after as few measurements as possible with the highest 11.5%
Predictions are only safe after this fraction of the
Characteristic event data, as we have seen events from all traces
possible accuracy. The most critical issue of any eTSC method is
to decide when enough data of a time series has been seen to take a
arXiv:1908.03405v2 [[Link]] 16 Aug 2019

decision: Waiting for more data points usually makes the classifica- 8.5%

tion problem easier but delays the time in which a classification is


Characteristic event
made; in contrast, earlier classification has to cope with less input
46.2%
data, often leading to inferior accuracy. The state-of-the-art eTSC
methods compute a fixed optimal decision time assuming that every
times series has the same defined start time (like turning on a ma- Total: 100%

chine). However, in many real-life applications measurements start


at arbitrary times (like measuring heartbeats of a patient), implying Figure 1: Traces of microwaves taken from [11]. The opera-
that the best time for taking a decision varies heavily between time tional state of the microwave starts between 5% and 50% of the
series. We present TEASER, a novel algorithm that models eTSC whole trace length. To have at least one event (typically a high
as a two two-tier classification problem: In the first tier, a classifier burst of energy consumption) for each microwave, the thresh-
periodically assesses the incoming time series to compute class prob- old has to be set to that of the latest seen operational state (after
abilities. However, these class probabilities are only used as output seeing more than 46.2%).
label if a second-tier classifier decides that the predicted label is
reliable enough, which can happen after a different number of mea-
surements. In an evaluation using 45 benchmark datasets, TEASER a crisis of a particular stock is detected, the faster it can be banned
is two to three times earlier at predictions than its competitors while from trading [10]).
reaching the same or an even higher classification accuracy. We eTSC has two goals: Classifying TS early and with high accuracy.
further show TEASER’s superior performance using real-life use However, these two goals are contradictory in nature: The earlier a
cases, namely energy monitoring, and gait detection. TS has to be classified, the less data is available for this classification,
usually leading to lower accuracy. In contrast, the higher the desired
KEYWORDS classification accuracy, the more data points have to be inspected
Time series; early classification; accurate; framework and the later eTSC will be able to take a decision. Thus, a critical
issue in any eTSC method is the determination of the point in time at
which an incoming TS can be classified. State-of-the-art methods in
1 INTRODUCTION eTSC [19, 34, 35] assume that all time series being classified have a
A time series (TS) is a collection of values sequentially ordered in defined start time. Consequently, these methods assume that charac-
time. One strong force behind their rising importance is the increas- teristic patterns appear roughly at the same offset in all TS, and try
ing use of sensors for automatic and high resolution monitoring in to learn the fixed fraction of the TS that is needed to make high accu-
domains like smart homes [15], starlight observations [24], machine racy predictions, i.e., when the accuracy of classification most likely
surveillance [20], or smart grids [14, 32]. Time series classification is close to the accuracy on the full TS. However, in many real life
(TSC) is the problem of assigning one of a predefined class to a time applications this assumption is wrong. For instance, sensors often
series, like recognizing the electronic device producing a certain tem- start their observations at arbitrary points in time of an essentially
poral pattern of energy consumption [9, 11] or classifying a signal indefinite time series. Intuitively, existing methods expect to see a
of earth motions as either an earthquake or a bypassing lorry [23]. TS from the point in time when the observed system starts working,
Conventional TSC works on time series of a given, fixed length and while in many applications, this system has already been working
assumes access to the entire input at classification time. In contrast, for an unknown period of time when measurements start. In such
early time series classification (eTSC), which we study in this work, settings, it is suboptimal to wait for a fixed number of measurements;
tries to solve the TSC problem after seeing as few measurements as instead, the algorithm should wait for the characteristic patterns,
possible [34]. This need arises when the classification decision is which may occur early (in which case an early classification is pos-
time-critical, for instance to prevent damage (the earlier a warning sible) or later (in which case the eTSC algorithm has to wait longer).
system can predict an earthquake from seismic data [23], the more As an example, Figure 1 illustrates traces for the operational state of
time there is for preparation), to speed-up diagnosis (the earlier an microwaves [11]. Observations started while the microwaves were
abnormal heart-beat is detected, the more time there is for prevention already under power; the concrete operational state, characterized
of fatal attacks [13]), or to protect markets and systems (the earlier by high bursts of energy consumption, happened after 5% to 50%
Symbol Meaning Definition 2.1. A time series T is a sequence of n ∈ N real values,
T = (t 1 , . . . , tn ), ti ∈ R. The values are also called data points. A
sc i / mc i a slave / master classifier at the i’th snapshot dataset D is a collection of time series.
S the number master/slave classifiers We assume that all TS of a dataset have the same sampling fre-
w a user-defined interval length quency, i.e., every i’th data point was measured at the same tem-
poral distance from the first point. In accordance to all previous
si the snapshot length, with si = i · w
approaches [19, 34, 35], we will measure earliness in the number of
n the time series length data points and from now on disregard the actual time of data points.
N the number of samples A central assumption of eTSC is that TS data arrives incrementally.
If a classifier is to classify a TS after s data points, it has access to
k number of classes these s data points only. This is called a snapshot.
p(c i ) class probability by the slave classifier
Definition 2.2. A snapshot T (s) = (t 1 , . . . , ts ) of a time series T ,
Y all class labels, Y = {c 1 , . . . c k } k ≤ n, is the prefix of T available for classification after seeing s
ci i’th class label in Y data points.
Table 1: Symbols and Notations. In principle, an eTSC system could try to classify a time series
after every new data point that was measured. However, it is more
practical and efficient to call the eTSC only after the arrival of a
fixed number of new data points [19, 34, 35]. We call this number
the interval length w. Typical values are 5, 10, 20, . . .
of the whole trace (amounting to 1 hour). Current eTSC methods eTSC is commonly approached as a supervised learning prob-
trained on this data would always wait for 30mins, because this was lem [18, 19, 22, 34, 35]. Thus, we assume the existence of a set
the last time they had seen the important event in the training data. D t r ain of training TS, where each one is assigned to one of a prede-
But actually most TS could be classified safely much earlier; instead fined set of class labels Y = {c 1 , . . . , c k }. The eTSC system learns
of assuming a fixed classification time, an algorithm should adapt a model from D t r ain that can separate the different classes. Its
its decision time to the individual time series. performance is estimated by applying this model to all instances of
In this paper we present TEASER, a Two-tier Early and Accurate a test set D t est .
Series classifiER, that is robust regarding the start time of a TS’s The quality of an eTSC system can be measured by different
recording. It models eTSC as a two-tier classification problem (see indicators. The accuracy of an eTSC is calculated as the percentage
Figure 3). In the first tier, a slave classifier periodically assesses the of correct predictions of the test instances, where higher is better:
input series and computes class probabilities. In the second tier, a number of correct predictions
master classifier takes the series of class probabilities of the slave as accuracy =
|D t est |
input and computes a binary decision on whether to report these as
The earliness of an eTSC is defined as the mean number of data
final result or continue with the observation. As such, TEASER does
points s after which a label is assigned, where lower is better:
not presume a fixed starting time of the recordings nor does it rely Í s
on a fixed decision time for predictions, but takes its decisions when- T ϵ D t e s t l en(T )
ever it confident of its prediction. On a popular benchmark of 45 earliness =
|D t est |
datasets [36], TEASER is two to three times as early while keeping
We can now formally define the problem of eTSC.
a competitive, and for some datasets reaching an even higher level of
accuracy, when compared to the state of the art [18, 19, 22, 34, 35]. Definition 2.3. Early time series classification (eTSC) is the prob-
Overall, TEASER achieves the highest average accuracy, lowest lem of assigning all time series T ∈ D t est a label from Y as early
average rank, and highest number of wins among all competitors. and as accurate as possible.
We furthermore evaluate TEASER’s performance on the basis of
eTSC thus has two optimization goals that are contradictory in
real use-cases, namely device-classification of energy load monitor-
nature, as later classification typically allows for more accurate
ing traces, and classification of walking motions into normal and
predictions and vice versa. Accordingly, eTSC methods can be
abnormal.
evaluated in different ways, such as comparing accuracies at a fixed-
The rest of the paper is organized as follows: In Section 2 we
length snapshot (keeping earliness constant), comparing earliness at
formally describe the background of eTSC. Section 3 introduces
which a fixed accuracy is reached (keeping accuracy constant), or by
TEASER and its building blocks. Section 4 presents evaluation
combining these two measures. A popular choice for the latter is the
results including benchmark data and real use cases. Section 5
harmonic mean of earliness and accuracy:
discusses related work and Section 6 presents the conclusion.
2 · (1 − earliness) · accuracy
2 BACKGROUND: TIME SERIES AND ETSC HM =
(1 − earliness) + accuracy
In this section, we formally introduce time series (TS) and early time An HM of 1 is equal to an earliness of 0% and an accuracy of 100%.
series classification (eTSC). We also describe the typical learning Figure 2 illustrates the problem of eTSC on a load monitoring task
framework used in eTSC. differentiating a digital receiver from a microwave. All traces have
2
Trace of a digital receiver length v of a series of predictions is an important parameter of
TEASER. Intuitively, the slave classifiers give their best prediction
based on the data they have seen, whereas the master classifiers
(a)
decide if these results can be trusted, and the final filter suppresses
spurious predictions.
Traces of microwaves
Formally, let w be the user-defined interval length and let nmax
be the length of the longest time series in the training set D t r ain .
We then extract snapshots T (si ) = T [1..i · w], i.e., time series snap-
(b)
shots of lengths si = i · w. A TEASER model consists of a set of
(b) (c) S = [nmax /w] pairs of slave/master classifiers, trained on the snap-
shots of the TS in D t r ain (see below for details). When confronted
with a new time series, TEASER waits for the next w data points to
arrive and then calls the appropriate slave classifier which outputs
Early classification of a trace of a digital receiver probabilities for all classes. Next, TEASER passes these probabil-
40 %
ities to the slave’s paired master classifier which either returns a
20 % class label or NIL, meaning that no decision could be derived. If
the answer is a class label c and this answer was also given for the
0%

Receiver
Microwave
Toaster
LCD
c?
T(s1) last v − 1 snapshots, TEASER returns c as result; otherwise, it keeps
microwave? waiting.
Before going into the details of TEASER’s components, consider
70 %
the example shown in Figure 3. The first slave classifier sc 1 falsely
35 % labels this trace of a digital receiver as a microwave (by computing
T(s2)
0%
a higher probability of the latter class than for the former class)
Receiver
Microwave
Toaster
LCD

a?
after seeing the first w data points. However, the master classifier
digital receiver!
mc 1 decides that this prediction is unsafe and TEASER continues
to wait. After i − 1 further intervals, the i’th pair of slave and
Figure 2: eTSC on a trace of a digital receiver. The figure shows
master classifiers sc i and mc i are called. Because the TS contained
one traces of a digital receiver, and of microwaves on the top.
characteristic patterns in the i’th interval, the slave now computes
These have three characteristic patterns (a) to (c). In the bottom
a high probability for the digital receiver class, and the master
part, eTSC is performed on a snapshot of the time series of a
decides that this prediction is safe. TEASER counts the number of
digital receiver. In its first snapshot it is easily confused with
consecutive predictions for this class and, if a threshold is passed,
pattern (c) of a microwave. However, the trace later contains
outputs the predicted class.
pattern (a) which is characteristic for a receiver.
Clearly, the interval length w and the threshold v are two im-
portant yet opposing parameters of TEASER. A smaller w results
in more frequent predictions, due to smaller prediction intervals.
an underlying oscillating pattern and in total there are three important
However, a classification decision usually only changes after seeing
patterns (a), (b), (c) which are different among the appliances. The
a sufficient number of novel data points; thus, a too small value
characteristic part of a receiver trace is an energy burst with two
for w leads to series of very similar class probabilities at the slave
plateaus (a), which can appear at different offsets. If an eTSC
classifiers, which may trick the master classifier. This can be com-
classifies a trace too early (Figure 2 second from bottom), the signal
pensated by increasing v. In contrast, a large value for w leads
is easily confused with that of microwaves based on the similarity to
to fewer predictions, where each one has seen more new data and
the (c) pattern. However, if an eTSC always waits until the offset at
thus is probably more reliable. For such settings, v may be reduced
which all training traces of microwaves can be correctly classified,
without harming earliness or accuracy. In our experiments, we shall
the first receiver trace will be classified much later than possible
analyze the influence of w on accuracy and earliness in Section 4.3.
(eventually after seeing the full trace). To achieve optimal earliness
In all experiments v is treated as a hyper-parameter that is learned
at high accuracy, an eTSC system must determine its decision times
by performing a grid-search and maximizing HM on the training
individually for each TS it analyses.
dataset.

3 EARLY & ACCURATE TS


CLASSIFICATION: TEASER 3.1 Slave Classifier
Each slave classifier sc i , with i ≤ S is a full-fledged time se-
TEASER addresses the problem of finding optimal and individual
ries classifier of its own, trained to predict classes after see-
decision times by following a two-tier approach. Intuitively, it trains
ing a fixed snapshot length. Given a snapshot T (si ) of length
a pair of classifiers for each snapshot s: A slave classifier computes
si = i · hw, the slave classifier isc i computes class probabilities
class probabilities which are passed on to a master classifier that
P(si ) = p c 1 (si ) , . . . , p c k (si ) for this time series for each of
 
decides whether these probabilities are high enough that a safe
classification can be emitted. TEASER monitors these predictions the predefined classes and determines the class c(si ) with highest
and predicts a class c if it occurred v times in a row; the minimum probability. Furthermore, it computes the difference 4di between
3
70 %
s(i-2)
MICROWAVE? RECEIVER? 52,5 % s(i-1)
s(i)
35 %
sc1 40 % mc1 sci 70 % mci 17,5 %
20 % 35 % 0%

To

LC
0%
delay 0%

ec

ic

as
Receiver
Microwave
Toaster
LCD

Receiver
Microwave
Toaster
LCD

D
r
ei

ow

te
ve

r
av
predict

e
Class probabilities
over snapshots si

true
count(RECEIVER) ≥ v RECEIVER
T(s1)
T(s2)
Energy consumption of a digital receiver

Figure 3: TEASER is given a snapshot of an energy consumption time series. After seeing the first s measurements, the first slave
classifier sc 1 performs a prediction which the master classifier mc 1 rejects due to low class probabilities. After observing the i’th
interval which includes a characteristic energy burst, the slave classifier sc i (correctly) predicts RECEIVER, and the master classifier
mc i eventually accepts this prediction. When the prediction of RECEIVER has been consistently derived v times, it is output as final
prediction.

Class probabilities,
labels, and delta of
1.0 Master's decision boundaries
positive samples
Learned decision function
Slave is correct
Slave Classifier sci Master Classifier mci Slave is false
40 %
(one-class SVM)
(WEASEL)
0.8
p(class='microwave')

20 %
0%
Class_1
Class_2
Class_3
Class_4

0.6
Truncated
Train
Dataset …
0.4

… si 0.2

Figure 4: TEASER trains pairs of slave and master classifiers. 0.0


The i’th slave classifier is trained on the time series truncated 0.0 0.2 0.4 0.6 0.8 1.0
after time stamp si . The master classifier is trained on the class p(class='receiver')
probabilities, and delta of the correctly predicted time series.
Figure 5: The master computes a hyper-sphere around the cor-
rectly predicted samples. A novel sample is accepted/rejected if
the highest it’s probabilities fall into/outside the orange hypersphere.
mi1 = arд max p(c j (si ))

jϵ [1...k ]
3.2 Master Classifier
and second highest
A master classifier mc i , with i ≤ S in TEASER learns whether the
mi2 = arд max p(c j (si ))

results of its paired slave classifier should be trusted or not. We
jϵ [1...k ], j,m 1 model this task as a classification problem in its own, where the
i’th master classifier uses the results of the i’th slave classifier as
class probabilities:
features for learning its model (see Section 3.3 for the details on
4di = p(cmi 1 ) − p(cmi 2 ) training). However, training this classifier is tricky. To learn accurate
decisions, it needs to be trained with a sufficient number of correct
In TEASER, the most probable class label c(si ), the vector of and false predictions. However, the more accurate a slave classifier
class probabilities P(si ), and the difference 4d(si ) are passed as is, the less mis-classifications are produced, and the worse gets
features to the paired i’th master classifier mc i , which then has to the expected performance of the paired master classifier. Figure 6
decide if the prediction is reliable (see Figure 4) or not. illustrates this problem by showing a typical slave’s train accuracy
4
Train Accuracy over increasing time stamps values that will only be available in the future. I.e., if a
time series is first z-normalized like all UCR time series,
100%
90% and then a truncated snapshot is generated, this snapshot
Train Accuracy

80%
70%
Unary (one-class) may not make use of the absolute values resulting from the
60%
Binary (two-class) z-normalization of the whole series (as opposed to [18]).
50%
40%
(3) The hyper-parameters of the slave classifier are trained
30% on D t r ain (si ) using 10-fold-cross validation. Using the
20%
10% derived hyper-parameters we can build the final slave clas-
00%
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
sifier sc i producing its 3-tuple output (c(si ), P(si ), 4d(si ))
Time Stamp for each T ∈ D t r ain (Figure 4 centre).
(4) To train the paired master classifier, we first remove all
instances which were incorrectly classified by the slave. As-
Figure 6: The accuracy of the slave classifier reaches 100% af- sume that there were N 0 ≤ N correct predictions. We then
ter seeing 13 time stamps on the train data, resulting in one- train a one-class SVM on the N 0 training samples, where
class classification. each sample is represented by the 3-tuple (c(si ), P(si ), d(si )
produced by the slave classifier.
(5) Finally, we perform a grid-search over values v ∈ {1...5}
with an increasing number of data points. In this example, the slave to find the threshold v which yields the highest harmonic
classifiers start with an accuracy of around 70% and reach 100% mean HM of earliness and accuracy on D t r ain .
quickly on the train data. Once the train accuracy reaches 100%,
there are no negative samples left for the master classifier to train its
decision boundary. Algorithm 1 Training phase of TEASER using S time stamps, and
To overcome this issue, we use a so-called one-class classifier as a labeled train dataset.
master classifier [16]. One-class classification refers to the problem Input Trainig set: X_data
of classifying positive samples in the absence of negative samples. Input Labels: Y_labels
It is closely related to, but non-identical to outlier/anomaly detec- Input Time Stamps: {1,2,...,S}
tion [4]. In TEASER, we use a one-class Support Vector Machine Returns: Slaves, Masters, v
(oc-SVM) [31] which does not determine a separating hyperplane (1) initialize array of slaves
between positive and negative samples but instead computes a hyper- (2) initialize array of masters
sphere around the positively labeled samples with minimal dilation (3) for t in {1,2,...,S} do
that has maximal distance to the origin. At classification time, all // z-normalized snapshorts
samples that fall outside this hypersphere are considered as negative (4) X_snapshot_normed
samples; in our case, this implies that the master learns a model = z-norm(truncateAfter(X_data, t));
of positive samples and regards all results not fitting this model as (5) c_probs, c_labels
negative. The major challenge is to determine a hyper-sphere that = slaves[t].fit(X_snapshot_normed);
is neither too large nor too small to avoid false positives, leading to // we keep the positive samples
lower accuracy, or dismissals, which lead to delayed predictions. (6) c_pos_probs = filter_correct(
In Figure 5 a trace is either labeled as microwave or receiver, c_labels, train_labels, c_probs);
and the master classifier learns that its paired slave is very pre- // train a one-class classifier
cise at predicting receiver traces but produces many false predic- (7) masters[t].fit(c_pos_probs);
tions for microwave traces. Thus, only receiver predictions with // Maximize HM to find the best v
class probability above p( 0receiver 0 ) ≥ 0.6, and microwaves above (8) v = grid_search(X_data, Y_labels,
p( 0microwave 0 ) ≥ 0.95 are accepted. As can be seen in this example, slaves[t], masters[t]);
using a one-class SVM leads to very flexible decision boundaries. (9) end for
(10)return (slaves, masters, v);
3.3 Training Slave and Master Classifiers
Consider a labeled set of time series D t r ain with class labels Y =
{c 1 , . . . , c k } and an interval length w. As before, nmax is the length In accordance to prior works [18, 22, 34, 35], we consider the
of the longest training instance. Then, the i’th pair of slave / master interval length w to be a user-specified parameter. However, we will
classifier is trained as follows: also investigate the impact of varying w in Section 4.3.
(1) First, we truncate the train dataset D t r ain to the prefix The pseudo-code for training TEASER is given in Algorithm 1.
length determined by i (snapshot si ): The aim of the training is to obtain S pairs of slave/master classifiers,
and the threshold v for consecutive predictions. First, for all z-
D t r ain (si ) = {T (si ) | T ∈ D t r ain }
normalized snapshots (line 4), the slaves are trained and the predicted
In Figure 4 (bottom) the four TS are truncated. labels and class probabilities are kept (line 5). Prior to training the
(2) Next, these truncated snapshots are z-normalized. This is a master, incorrectly classified instances are removed (line 6). The
critical step for training to remove any bias resulting from feature vectors of correctly labeled samples are passed on to train
5
the master (one-class SVM) classifier (line 7). Finally, an optimal In our first experiment we tested the influence of different slave
value for v is determined using grid-search. and master classifiers. We compared the three different slave TS
classifiers: DTW, BOSS, WEASEL. As a master classifier we have
4 EXPERIMENTAL EVALUATION used one-class SVM (ocSVM), SVM with a RBF kernel (RBF-
SVM) and linear regression (Regression). We performed these
We first evaluate TEASER using the 45 datasets from the UCR
experiments using default hyper-parameters to ease comparisons.
archive that also have been used in prior works on eTSC [19, 22, 34,
We compare performances in terms of HM to the other competitors
35]. Each UCR dataset provides a train and test split set which we
ECTS, RelClass, EDSC and ECDIRE.
use unchanged. Note that most of these datasets were preprocessed to
We first fixed the master classifier to oc-SVM and compared
create approximately aligned patterns of equal length and scale [27].
all three different slave classifiers (DTW+ocSVM, BOSS+ocSVM,
Such an alignment is advantageous for methods that make use of a
WEASEL+ocSVM) in Figure 7. Out of these, TEASER us-
fixed decision time but also requires additional effort and introduces
ing WEASEL (WEASEL+ocSVM) has the best (lowest) rank.
new parameters that must be determined, steps that are not required
Next, we fixed the slave classifier to WEASEL and compared
with TEASER. We also evaluate on additional real-life datasets
the three different master classifiers (ocSVM, RBF-SVM, Regres-
where no such alignment was performed.
sion). Again, TEASER using ocSVM performed best. The most
We compared our approach to the state-of-the-art methods,
significant improvement over the state of the art was archived by
ECTS [34], RelClass [22], EDSC [35], and ECDIRE [19]. On
TEASER+WEASEL+ocSVM, which justifies our design decision
the UCR datasets, we use published numbers on accuracy and earli-
to model early classification as a one-class classification problem.
ness of these methods to compare to TEASER’s performance. As
Based on these results we use WEASEL [30] as a slave classi-
in these papers, we use w = nmax /20 as default interval length.
fier and ocSVM for all remaining experiments and refer to it as
For ECDIRE, ECTS, and RelCLASS, the respective authors also
TEASER. A nice aspect of WEASEL is that it is comparably fast,
released their code, which we use to compute their performance on
highly accurate, and works with variable length time series. As
our additional two use-cases. We were not able to obtain runnable
a hyper-parameter we learn the best word length between 4 and 6
code of EDSC, but note that EDSC was the least accurate eTSC on
for WEASEL on each dataset using 10-fold cross-validation on the
the UCR data. All experiments ran on a server running LINUX with
train data. ocSVM parameters for the remaining experiments were
2xIntel Xeon E5-2630v3 and 64GB RAM, using JAVA JDK x64 1.8.
determined as follows: nu-value was fixed to 0.05, i.e. 5% of the
TEASER is a two-tier model using a slave and a master classi-
samples may be dismissed, kernel was fixed to RBF and the optimal
fier. As a first tier, TEASER required a TSC which produces class
gamma value was obtained by grid-search within {1 . . . 100} on the
probabilities as output. Thus, we performed our experiments using
train dataset.
three different time series classifiers: WEASEL [30], BOSS [28]
and 1-NN Dynamic Time Warping (DTW). As a second tier, we
have benchmarked three master classifiers, one-class SVM using
4.2 Performance on the UCR Datasets
LIBSVM [2], linear regression using liblinear [6], and an SVM using
an RBF kernel [2]. eTSC is about predicting accurately and earlier. Figure 8 shows
For each experiment, we report the evaluation metrics accuracy, two critical difference diagrams (as introduced in [5]) for earliness
earliness, their harmonic mean HM, and Pareto optimality. The and accuracy over the average ranks of the different eTSC methods.
Pareto optimality criterion counts a method as better than a competi- The best classifiers are shown to the right of the diagram and have
tor whenever it obtains better results in at least one metric without the lowest (best) average ranks. The group of classifiers that are
being worse in any other metrics. All performance metrics were not significantly different in their rankings are connected by a bar.
computed using only results on the test split. To support reproducibil- The critical difference (CD) length represents statistically significant
ity, we provide the TEASER source code and the raw measurement differences using a Wilcoxon signed-rank test. With a rank of 1.44
sheets [? ]. (earliness) and 2.38 (accuracy) TEASER is significantly earlier than
all other methods and overall is among the most accurate approaches.
On our webpage we published all raw measurements [? ] for each
4.1 Choice of Slave and Master classifiers
of the 45 datasets. TEASER is the most accurate on 22 datasets,
followed by ECDIRE and RelClass being best in 12 and 10 sets,
CD respectively. TEASER also has the highest average accuracy of
75%, followed by RelClass (74%), ECDIRE (72.6%) and ECTS (71%).
987654321
EDSC is clearly inferior to all other methods in terms of accuracy
with 62%. TEASER provides the earliest predictions in 32 cases,
followed by ECDIRE with 7 cases and the remaining competitors
ECTS TEASER WEASEL+ocSVM with 2 cases each. On average, TEASER takes its decision after
RelClass TEASER WEASEL+RBF-SVM
EDSC TEASER WEASEL+Regression seeing 23% of the test time series, whereas the second and third
TEASER DTW+ocSVM TEASER BOSS+ocSVM earliest methods, i.e., EDCS and ECDIRE, have to wait for 49% and
ECDIRE 50%, respectively. It is also noteworthy that the second most accurate
method RelClass provides the overall latest predictions with 71%.
Figure 7: Average Harmonic Mean (HM) over earliness and Note that all competitors have been designed for highest possible
accuracy for all 45 TS datasets (lower rank is better). accuracy, whereas TEASER was optimized for the harmonic mean of
6
CD CD
5 4 3 2 1 5 4 3 2 1

RelClass TEASER EDSC TEASER


ECTS ECDIRE ECTS RelClass
EDSC ECDIRE
(a) Average ranks over earliness for early TS classifiers. (b) Average ranks over accuracy for early TS classifiers.

Figure 8: Average ranks over earliness (left) and accuracy (right) for 45 TS datasets (lower rank is better).

Datasets ordered by Type


Harmonic Mean of Accuracy & Earliness

80%

higher is better
60%
Image Outline

Motion Sensors

Sensor Readings

Synthetic
40%
TEASER
ECDIRE
20% RelClass
ECTS
EDSC
0%
Adiac
DiatomSizeReduction
FaceAll
FaceFour
FacesUCR
FiftyWords
Fish
MedicalImages
OSULeaf
SwedishLeaf
Symbols
WordsSynonyms
yoga
Cricket_X
Cricket_Y
Cricket_Z
Gun_Point
Haptics
InlineSkate
uWaveGestureLibrary_X
uWaveGestureLibrary_Y
uWaveGestureLibrary_Z
Beef
ChlorineConcentration
CinC_ECG_torso
Coffee
ECG200
ECGFiveDays
ItalyPowerDemand
Lighting2
Lighting7
MoteStrain
NonInvasiveFatalECG_Thorax1
NonInvasiveFatalECG_Thorax2
OliveOil
SonyAIBORobot Surface
SonyAIBORobot SurfaceII
StarlightCurves
Trace
TwoLeadECG
wafer
CBF
MALLAT
Two_Patterns
synthetic_control
Figure 9: Harmonic mean (HM) for TEASER vs. the four eTSC classifiers (ECTS, EDSC, RelClass and ECDIRE). Red dots indicate
where TEASER has a higher HM than the other classifiers. In total there are 36 wins for TEASER.

ECDIRE RelClass ECTS EDSC sets where a method is Pareto optimal over this competitor. Results
TEASER 19/25/1 22/23/0 26/18/1 30/15/0 are shown in Table 2. TEASER is dominated in only two cases by
another method, whereas it dominates in 19 to 30 out of the 45 cases
Table 2: Summary of domination counts (wins/ties/losses) using
In the context of eTSC the most runtime critical aspect is the
earliness and accuracy (Pareto Optimality):
prediction phase, in which we wish to be able to provide an answer as
soon as possible, before new data points arrives. As all competitors
were implemented using different languages, it would not be entirely
fair to compare wall-clock-times of implementations. Thus, we
earliness and accuracy (Recall that TEASER nevertheless also is the count the number of master predictions that are needed on average
most accurate eTSC method on average). It is thus not surprising that for TEASER to accept a master’s prediction . TEASER requires 3.6
TEASER beats all competitors in terms of HM in 36 of the 45 cases. predictions on average (median 3.0) to accept the prediction after
Figure 9 visualizes the HM value achieved by TEASER (black line) seeing 23% of the TS on average. Thus, regardless of the used master
vs. the four other eTSC methods. This graphic sorts the datasets classifier, a roughly 360% faster infrastructure would be needed on
according to a predefined grouping of the benchmark data into four average for TEASER, in comparison to making a single prediction at
types, namely synthetic, motion sensors, sensor readings and image a fixed threshold (like the ECTS framework with earliness of 70%).
outlines. TEASER has the best average HM value in all four of
these groups; only in the group composed of synthetic datasets
EDSC comes close with a difference of just 3 percentage points (pp). 4.3 Impact of Interval Length
In all other groups TEASER improves the HM by at least 20 pp when To make results comparable to that of previous publications, all
compared to its best performing competitor. In some of the UCR experiments described so far used a fixed value for the interval
datasets classifiers excel in one metric (accuracy or earliness) but length w derived from breaking the time series into S = 20 intervals.
are beaten in another. To determine cases were a method is clearly Figure 10 shows boxplot diagrams for earliness (left) and accuracy
better than a given competitor, we also computed the number of (right) when varying the value of w so that predictions are made
7
Influence of w on Earliness Influence of w on Accuracy
100% 100%

80% 80%

Accuracy
60%
Earliness

60%
40%
40% 20%
20% 0%
Upper 10 7 5 4 3
0% Bound
10 7 5 4 3 (w=100)
Interval length w in % Interval length w in %

(a) Boxplot for earliness for varying parameter w over all 45 datasets. (b) Boxplot for accuracy for varying parameter w over all 45 datasets.

Figure 10: Average earliness (left; lower is better) and accuracy (right; higher is better) for TEASER on the 45 TS datasets.

Samples N TS length n Classes use-cases were generated in the context of appliance load monitoring
and capture the power consumption of common household appli-
Train Test Min Max Avg Total
ances over time, whereas the latter records the z-axis accelerometer
ACS-F1 537 537 101 1344 325 11 values of either the right or the left toe of four persons while walking
to discriminate normal from abnormal walking styles.
PLAID 100 100 1460 1460 1460 10
ACS-F1 monitored about 100 home appliances divided into 10 ap-
Walking Motions 40 228 277 616 448 2 pliance types (mobile phones, coffee machines, personal computers,
Table 3: Use-cases ACS-F1, PLAID and Walking Motions. fridges and freezers, Hi-Fi systems, lamps, laptops, microwave oven,
printers, and televisions) over two sessions of one hour each. The
time series are very long and have no defined start points. No prepro-
cessing was applied. We expect all eTSC methods to require only a
after seeing multiples from 10% of a dataset down to multiples of 3%. fraction of the overall TS, and we expect TEASER to outperform
Thus, in the latter case TEASER outputs a prediction after seeing other methods in terms of earliness.
3%, 6%, 9%, etc. of the entire time series. Interestingly, accuracy PLAID monitored 537 home appliances divided into 11 types (air
decreases whereas earliness improves with decreasing w, meaning conditioner, compact fluorescent, lamp, fridge, hairdryer, laptop,
that TEASER tends to make earlier predictions, thus seeing less microwave, washing machine, bulb, vacuum, fan, and heater). For
data, with shorter interval length. Thus, changing w influences each device, there are two concatenated time series, where the first
the trade-off between earliness and accuracy: If early (accurate) was taken at start-up of the device and the second during steady-state
predictions are needed, one should choose a low (high) w value. We operation. The resulting TS were preprocessed to create approxi-
further plot the upper bound of TEASER, that is the accuracy at mately aligned patterns of equal length and scale. We expect eTSC
w = 100, equal to always using the full TS to do the classification. methods to require a larger fraction of the data and the advantage of
The difference between w = 100 and w = 10 is surprisingly small TEASER being less pronounced due to the alignment.
with 5pp difference. Overall, TEASER gets to 95% of the optimum CMU recorded time series taken from four walking persons, with
using on average 40% of the time series. some short walks that last only three seconds and some longer walks
that last up to 52 seconds. Each walk is composed of multiple gait
4.4 Three Real-Life Datasets cycles. The difficulties in this dataset result from variable length gait
cycles, gait styles and paces due to different subjects performing
The UCR datasets used so far all have been preprocessed to make
different activities including stops and turns. No preprocessing was
their analysis easier and, in particular, to achieve roughly the same
applied. Here, we expect TEASER to strongly outperform the other
offsets for the most characteristic patterns. This setting is very favor-
eTSC methods due to the higher heterogeneity of the measurements
able for those methods that expect equal offsets, which is true for
and the lack of defined start times.
all eTSC methods discussed here except TEASER; it is even more
We fixed w to 5% of the maximal time series length of the dataset
reassuring that even under such non-favorable settings TEASER
for each experiment. Table 4 shows results of all methods on these
generally outperforms its competitors. In the following we describe
three datasets. TEASER requires 19% (ACS-F1) and 64% (PLAID) of
an experiment performed on three additional datasets, namely ACS-
the length of the sessions to make reliable predictions with accuracies
F1 [11], PLAID [9], and CMU [3]. As can be seen from Table 3,
of 83% and 91.6%, respectively. As expected, a smaller fraction of
these datasets have interesting characteristics which are quite distinct
the TS is necessary for ACS-F1 than for PLAID. All competitors
from those of the UCR data, as all UCR datasets have a fixed length
are considerably less accurate than TEASER with a difference of
and were preprocessed for approximate alignment. The former two
8
ECDIRE ECTS RelClass TEASER
Acc. Earl. HM Acc. Earl. HM Acc. Earl. HM Acc. Earl. HM
ACS-F1 73.0% 44.4% 63.2% 55.0% 78.5% 31.0% 54.0% 59.0% 46.6% 83.0% 19.0% 82.0%
PLAID 63.0% 21.0% 70.1% 57.7% 47.9% 54.7% 58.7% 58.5% 48.6% 91.6% 64.0% 51.7%
Walking Motions 50.0% 68.4% 38.7% 76.3% 83.7% 26.9% 66.7% 64.1% 46.7% 93.0% 34.0% 77.2%
Table 4: Accuracy and harmonic mean (HM), where higher is better, and earliness, where lower is better, on three real world use
cases. TEASER has the highest accuracy on all datasets, and the best earliness on all but the PLAID dataset.

Walking Motions
this dataset (see Figure 11). A normal walking motion consists of
roughly three repeated similar patterns. TEASER is able to detect
normal walking motions after seeing 34% of the walking patterns on
average, which is mostly equal to one out of the three gait cycles.
Abnormal walking motions take much longer to classify due to the
absence of a gait cycle. Also, one of the normal walking motions
(third from top) requires longer inspection time of two gait cycles,
as the first gait cycle seems to start with an abnormal spike.

5 RELATED WORK
The techniques used for time series classification (TSC) can be
Abnormal pattern Regular pattern broadly categorized into two classes: whole series-based methods
and feature-based methods [17]. Whole series-based methods make
use of a point-wise comparison of entire TS like 1-NN Dynamic
Time Warping (DTW) [25]. In contrast, feature-based classifiers
rely on comparing features generated from substructures of TS. Ap-
proaches can be grouped as either using shapelets or bag-of-patterns
(BOP). Shapelets are defined as TS subsequences that are maximally
representative of a class [12, 37]. The (BOP) model [17, 28–30]
breaks up a TS into a bag of substructures, represents these substruc-
tures as discrete features, and finally builds a histogram of feature
Figure 11: Earliness of predictions on the walking motion counts as basis for classification. The recent Word ExtrAction for
dataset. Orange (top): abnormal walking motions. Green (bot- time SEries cLassification (WEASEL) [30] also conceptually builds
tom, dashed): Normal walking motions. In bold color: the frac- on the bag-of-patterns (BOP) approach and is one of the fastest and
tion of the TS needed for classification. In light color: the whole most accurate classifiers. In [33] deep learning networks are applied
series. to TSC. Their best performing full convolutional network (FCN)
performs not significantly different from state of the art. [7] presents
an overview of deep learning approaches.
10 to 20 percentage points (pp) on ACS-F1 and 29 to 34 pp on Early classification of time series (eTSC) [26] is important when
PLAID. In terms of earliness TEASER is the earliest method on the data becomes available over time and decisions need to be taken as
ACS-F1 dataset but the slowest on the PLAID dataset; although its early as possible. It addresses two conflicting goals: maximizing
accuracy on this dataset is far better than that of the other methods, accuracy typically reduces earliness and vise-versa. Early Classi-
it is only third best in terms of HM value. As ECDIRE has an fication on Time Series (ECTS) [34] is one of the first papers to
earliness of 21% for the PLAID dataset, we performed an additional introduce the problem. The authors adopt a 1-nearest neighbor (1-
experiment where we forced TEASER to always output a prediction NN) approach and introduce the concept of minimum prediction
after seeing at most 20% of the data, which is roughly equal to the length (MPL) in combination with clustering. Time series with
earliness of ECDIRE. In this case TEASER achieves an accuracy of the same 1-NN are clustered. The optimal prefix length for each
78.2%, which is still higher than that of all competitors. Recall that cluster is obtained by analyzing the stability of the 1-NN decision
TEASER and its competitors have different optimization goals: HM for increasing time stamps. Only those clusters with stable and
vs accuracy. Still, if we set the earliness of TEASER to that of its accurate offsets are kept. To give a prediction for an unlabeled TS,
earliest competitor, TEASER obtains a higher accuracy. the 1-NN is searched among clusters. Reliable Early Classification
The advantages of TEASER become even more visible on the (RelClass) [22] presents a method based on quadratic discriminant
difficult CMU dataset. Here, TEASER is 15 to 40 pp more accurate analysis (QDA). A reliability score is defined as the probability that
while requiring 35 to 54 pp less data points than its competitors. the predicted class for the truncated and the whole time series will be
The reasons become visible when inspecting some examples of the same. At each time stamp, RelClass then checks if the reliability
9
is higher than a user-defined threshold. Early Classification of Time [8] M. M. Gaber, A. Zaslavsky, and S. Krishnaswamy. Mining data streams: a review.
Series based on Discriminating Classes Over Time (ECDIRE) [19] ACM Sigmod Record, 34(2):18–26, 2005.
[9] J. Gao, S. Giri, E. C. Kara, and M. Bergés. Plaid: a public dataset of high-
trains classifiers at certain time stamps, i.e. at percentages of the resoultion electrical appliance measurements for load identification research:
full time series length. It learns a safe time stamp as the fraction of demo abstract. In Proceedings of the 1st ACM Conference on Embedded Systems
for Energy-Efficient Buildings, pages 198–199. ACM, 2014.
the time series which states that a prediction is safe. Furthermore, [10] M. F. Ghalwash, V. Radosavljevic, and Z. Obradovic. Utilizing temporal patterns
a reliability threshold is learned using the difference between the for estimating uncertainty in interpretable early decision making. In Proceedings
two highest class probabilities. Only predictions passing this thresh- of the 20th ACM SIGKDD international conference on Knowledge discovery and
data mining, pages 402–411. ACM, 2014.
old after the safe time stamp are chosen. The idea of EDSC [35] [11] C. Gisler, A. Ridi, D. Zujferey, O. A. Khaled, and J. Hennebert. Appliance
is to learn Shapelets that appear early in the time series, and that consumption signature database and recognition test protocols. In International
discriminate between classes as early as possible. [18] approaches Workshop on Systems, Signal Processing and their Applications (WoSSPA), pages
336–341. IEEE, 2013.
early classification as an optimization problem. The authors com- [12] J. Grabocka, N. Schilling, M. Wistuba, and L. Schmidt-Thieme. Learning time-
bine a set of probabilistic classifiers with a stopping rule that is series shapelets. In Proceedings of the 2014 ACM SIGKDD International Confer-
ence on Knowledge Discovery and Data Mining, pages 392–401. ACM, 2014.
optimized using a cost function on earliness and accuracy. Their [13] M. P. Griffin and J. R. Moorman. Toward the early diagnosis of neonatal sepsis
best performing model SR1-CF1 is significantly earlier than the state and sepsis-like illness using novel heart rate analysis. Pediatrics, 107(1):97–104,
of the art but their accuracy falls behind. However, the code has a 2001.
[14] B. F. Hobbs, S. Jitprapaikulsarn, S. Konda, V. Chankong, K. A. Loparo, and
logical design flaw, which renders the results hard to compare to, D. J. Maratukulam. Analysis of the value for unit commitment of improved load
but apparently results in good scores on the UCR datasets. Their forecasts. IEEE Transactions on Power Systems, 14(4):1342–1348, 1999.
algorithm uses z-normalized time series, which are then truncated [15] Z. Jerzak and H. Ziekow. The DEBS 2014 Grand Challenge. In Proceedings
of the 2014 ACM International Conference on Distributed Event-based Systems,
to build a training set. Thereby, a truncated subsequence makes pages 266–269. ACM, 2014.
use of information about values that will only be available in the [16] S. S. Khan and M. G. Madden. A survey of recent trends in one class classification.
In Irish conference on artificial intelligence and cognitive science, pages 188–197.
future for normalization. I.e., their absolute values are a result of Springer, 2009.
z-normalization with data that has not arrived yet. In TEASER the [17] J. Lin, R. Khade, and Y. Li. Rotation-invariant similarity in time series using bag-
truncated train series are z-normalized first, thus removing any bias of-patterns representation. Journal of Intelligent Information Systems, 39(2):287–
315, 2012.
from values that have not been seen. We decided to omit SR1-SF1 [18] U. Mori, A. Mendiburu, S. Dasgupta, and J. A. Lozano. Early classification
from our evaluation due to this normalization issue. A problem of time series by simultaneously optimizing the accuracy and earliness. IEEE
related to eTSC is the classification of streaming time series [8, 21]. Transactions on Neural Networks and Learning Systems, 2017.
[19] U. Mori, A. Mendiburu, E. Keogh, and J. A. Lozano. Reliable early classification
In these works, the task is to assign class labels to time windows of of time series based on discriminating the classes over time. Data mining and
a potentially infinite data stream, and is similar to event detection knowledge discovery, 31(1):233–263, 2017.
[20] C. Mutschler, H. Ziekow, and Z. Jerzak. The DEBS 2013 grand challenge. In
in streams [1]. The data enclosed in a time window is considered to Proceedings of the 2013 ACM International Conference on Distributed Event-
be an instance for a classifier. Due to the windowing, multiple class based Systems, pages 289–294. ACM, 2013.
labels can be assigned to a data stream. In contrast, eTSC aims at [21] H.-L. Nguyen, Y.-K. Woon, and W.-K. Ng. A survey on data stream clustering
and classification. Knowledge and Information Systems, 45(3):535–569, Dec
assigning a label to an entire TS as soon as possible. 2015.
[22] N. Parrish, H. S. Anderson, M. R. Gupta, and D. Y. Hsiao. Classifying with
6 CONCLUSION confidence from incomplete information. The Journal of Machine Learning
Research, 14(1):3561–3589, 2013.
We presented TEASER, a novel method for early classification [23] T. Perol, M. Gharbi, and M. Denolle. Convolutional neural network for earthquake
of time series. TEASER’s decision for the safety (accuracy) of a detection and location. Science Advances, 4(2):e1700578, 2018.
[24] P. Protopapas, J. Giammarco, L. Faccioli, M. Struble, R. Dave, and C. Alcock.
prediction is treated as a classification problem, in which master Finding outlier light curves in catalogues of periodic variable stars. Monthly
classifiers continuously analyze the output of probabilistic slave Notices of the Royal Astronomical Society, 369(2):677–696, 2006.
[25] T. Rakthanmanon, B. Campana, A. Mueen, G. Batista, B. Westover, Q. Zhu, J. Za-
classifiers to decide if their predictions should be trusted or not. By karia, and E. Keogh. Searching and mining trillions of time series subsequences
means of this technique, TEASER adapts to the characteristics of under dynamic time warping. In Proceedings of the 2012 ACM SIGKDD Inter-
classes irrespectively of the moments at which they occur in a TS. national Conference on Knowledge Discovery and Data Mining, pages 262–270.
ACM, 2012.
In an extensive experimental evaluation using altogether 48 datasets, [26] T. Santos and R. Kern. A literature survey of early time series classification and
TEASER outperforms all other methods, often by a large margin deep learning. In Sami@ iknow, 2016.
and often both in terms of earliness and accuracy. [27] P. Schäfer. Towards Time Series Classification without Human Preprocessing.
In Machine Learning and Data Mining in Pattern Recognition, pages 228–242.
Springer, 2014.
REFERENCES [28] P. Schäfer. The BOSS is concerned with time series classification in the presence
[1] C. C. Aggarwal and K. Subbian. Event detection in social streams. In Proceedings of noise. Data Mining and Knowledge Discovery, 29(6):1505–1530, 2015.
of the 2012 SIAM international conference on data mining, pages 624–635. SIAM, [29] P. Schäfer and M. Högqvist. SFA: a symbolic fourier approximation and index
2012. for similarity search in high dimensional datasets. In Proceedings of the 2012
[2] C.-C. Chang and C.-J. Lin. Libsvm: a library for support vector machines. ACM International Conference on Extending Database Technology, pages 516–527.
transactions on intelligent systems and technology (TIST), 2(3):27, 2011. ACM, 2012.
[3] CMU Graphics Lab Motion Capture Database. [Link] [30] P. Schäfer and U. Leser. Fast and Accurate Time Series Classification with
[4] M. Cuturi and A. Doucet. Autoregressive kernels for time series. arXiv preprint WEASEL. Proceedings of the 2017 ACM on Conference on Information and
arXiv:1101.0673, 2011. Knowledge Management, pages 637–646, 2017.
[5] J. Demšar. Statistical comparisons of classifiers over multiple data sets. The [31] B. Schölkopf, J. C. Platt, J. Shawe-Taylor, A. J. Smola, and R. C. Williamson.
Journal of Machine Learning Research, 7:1–30, 2006. Estimating the support of a high-dimensional distribution. Neural computation,
[6] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. Liblinear: A 13(7):1443–1471, 2001.
library for large linear classification. The Journal of Machine Learning Research, [32] The Value of Wind Power Forecasting. [Link]
9:1871–1874, 2008. pdf, 2016.
[7] H. I. Fawaz, G. Forestier, J. Weber, L. Idoumghar, and P.-A. Muller. Deep learning [33] Z. Wang, W. Yan, and T. Oates. Time series classification from scratch with
for time series classification: a review. arXiv preprint arXiv:1809.04356, 2018. deep neural networks: A strong baseline. In Neural Networks (IJCNN), 2017
10
International Joint Conference on, pages 1578–1585. IEEE, 2017. [36] Y Chen, E Keogh, B Hu, N Begum, A Bagnall, A Mueen and G Batista . The
[34] Z. Xing, J. Pei, and S. Y. Philip. Early classification on time series. Knowledge UCR Time Series Classification Archive. [Link]
and information systems, 31(1):105–127, 2012. series data, 2015.
[35] Z. Xing, J. Pei, P. S. Yu, and K. Wang. Extracting interpretable features for early [37] L. Ye and E. J. Keogh. Time series shapelets: a new primitive for data mining. In
classification on time series. In Proceedings of the 2011 SIAM International Proceedings of the 2009 ACM SIGKDD International Conference on Knowledge
Conference on Data Mining, pages 247–258. SIAM, 2011. Discovery and Data Mining. ACM, 2009.

11

You might also like