0% found this document useful (0 votes)

15 views12 pages

imputeTS: Univariate Time Series Imputation

The imputeTS package in R focuses on univariate time series imputation, offering various algorithms and plotting functions to handle missing data. Unlike other packages that cater to multivariate data, imputeTS is uniquely designed for univariate time series, employing time dependencies for effective imputation. The paper provides an overview of the package's features, usage examples, and evaluates its performance using benchmark datasets.

Uploaded by

Eshna Mohanty

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views12 pages

imputeTS: Univariate Time Series Imputation

Uploaded by

Eshna Mohanty

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

C ONTRIBUTED RESEARCH ARTICLE 207

imputeTS: Time Series Missing Value

Imputation in R
by Steffen Moritz and Thomas Bartz-Beielstein

Abstract The imputeTS package specializes on univariate time series imputation. It offers multiple
state-of-the-art imputation algorithm implementations along with plotting functions for time series
missing data statistics. While imputation in general is a well-known problem and widely covered by R
packages, finding packages able to fill missing values in univariate time series is more complicated. The
reason for this lies in the fact, that most imputation algorithms rely on inter-attribute correlations, while
univariate time series imputation instead needs to employ time dependencies. This paper provides an
introduction to the imputeTS package and its provided algorithms and tools. Furthermore, it gives a
short overview about univariate time series imputation in R.

Introduction
In almost every domain from industry (Billinton et al., 1996) to biology (Bar-Joseph et al., 2003), finance
(Taylor, 2007) up to social science (Gottman, 1981) different time series data are measured. While the
recorded datasets itself may be different, one common problem are missing values. Many analysis
methods require missing values to be replaced with reasonable values up-front. In statistics this
process of replacing missing values is called imputation.

Time series imputation thereby is a special sub-field in the imputation research area. Most popular
techniques like Multiple Imputation (Rubin, 1987), Expectation-Maximization (Dempster et al., 1977),
Nearest Neighbor (Vacek and Ashikaga, 1980) and Hot Deck (Ford, 1983) rely on inter-attribute
correlations to estimate values for the missing data. Since univariate time series do not possess
more than one attribute, these algorithms cannot be applied directly. Effective univariate time series
imputation algorithms instead need to employ the inter-time correlations.

On CRAN there are several packages solving the problem of imputation of multivariate data. Most
popular and mature (among others) are AMELIA (Honaker et al., 2011), mice (van Buuren and
Groothuis-Oudshoorn, 2011), VIM (Kowarik and Templ, 2016) and missMDA (Josse and Husson,
2016). However, since these packages are designed for multivariate data imputation only they do not
work for univariate time series.

At the moment imputeTS (Moritz, 2017a) is the only package on CRAN that is solely dedicated to
univariate time series imputation and includes multiple algorithms. Nevertheless, there are some
other packages that include imputation functions as addition to their core package functionality. Most
noteworthy being zoo (Zeileis and Grothendieck, 2005) and forecast (Hyndman, 2017). Both packages
offer also some advanced time series imputation functions. The packages spacetime (Pebesma, 2012),
timeSeries (Rmetrics Core Team et al., 2015) and xts (Ryan and Ulrich, 2014) should also be mentioned,
since they contain some very simple but quick time series imputation methods. For a broader overview
about available time series imputation packages in R see also (Moritz et al., 2015). In this technical
report we evaluate the performance of several univariate imputation functions in R on different time
series.

This paper is structured as follows: Section Overview imputeTS package gives an overview, about all
features and functions included in the imputeTS package. This is followed by Usage examples of the
different provided functions. The paper ends with a Conclusions section.

Overview imputeTS package

The imputeTS package can be found on CRAN and is an easy to use package that offers several
utilities for ’univariate, equi-spaced, numeric time series’ .
Univariate means there is just one attribute that is observed over time. Which leads to a sequence
of single observations o1 , o2 , o3 , ... on at successive points t1 , t2 , t3 , ... tn in time. Equi-spaced means,
that time increments between successive data points are equal |t1 − t2 | = |t2 − t3 | = ... = |tn−1 − tn |.
Numeric means that the observations are measurable quantities that can be described as a number.
In the first part of this section, a general overview about all available functions and datasets is given.

The R Journal Vol. 9/1, June 2017 ISSN 2073-4859

C ONTRIBUTED RESEARCH ARTICLE 208

This is followed by more detailed overviews about the three areas covered by the package: ’Plots &
Statistics’, ’Imputation’ and ’Datasets’. Information about how to apply these functions and tools can
be found later in the Usage examples section.

General overview

As can be seen in Table 1, beyond several imputation algorithm implementations the package also
includes plotting functions and datasets. The imputation algorithms can be divided into rather simple
but fast approaches like mean imputation and more advanced algorithms that need more computation
time like kalman smoothing on a structural model.

Simple Imputation Imputation Plots & Statistics Datasets

[Link] [Link] [Link] tsAirgap
[Link] [Link] [Link] tsAirgapComplete
[Link] [Link] [Link] tsHeating
[Link] [Link] [Link] tsHeatingComplete
[Link] [Link] statsNA tsNH4
tsNH4Complete

Table 1: General Overview imputeTS package

As a whole, the package aims to support the user in the complete process of replacing missing values
in time series. This process starts with analyzing the distribution of the missing values using the
statsNA function and the plots of [Link], [Link], [Link].
In the next step the actual imputation can take place with one of the several algorithm options. Finally,
the imputation results can be visualized with the [Link] function. Additionally, the
package contains three datasets, each in a version with and without missing values, that can be used
to test imputation algorithms.

Plots & statistics functions

An overview about the available plots and statistics functions can be found in Table 2. To get a good
impression what the plots look like section Usage examples is recommended.

Function Description
[Link] Visualize Distribution of Missing Values
[Link] Visualize Distribution of Missing Values (Barplot)
[Link] Visualize Distribution of NA gap sizes
[Link] Visualize Imputed Values
statsNA Print Statistics about the Missing Data

Table 2: Overview Plots & Statistics

The statsNA function calculates several missing data statistics of the input data. This includes overall
percentage of missing values, absolute amount of missing values, amount of missing value in different
sections of the data, longest series of consecutive NAs and occurrence of consecutive NAs. The
[Link] function visualizes the distribution of NAs in a time series. This is done using a
standard time series plot, in which areas with missing data are colored red. This enables the user to see
at first sight where in the series most of the missing values are located. The [Link]
function provides the same insights to users, but is designed for very large time series. This is necessary
for time series with 1000 and more observations, where it is not possible to plot each observation as a
single point. The [Link] function provides information about consecutive NAs by showing
the most common NA gap sizes in the time series. The [Link] function is designated for
visual inspection of the results after applying an imputation algorithm. Therefore, newly imputed
observations are shown in a different color than the rest of the series.

The R Journal Vol. 9/1, June 2017 ISSN 2073-4859

C ONTRIBUTED RESEARCH ARTICLE 209

Imputation functions

An overview about all available imputation algorithms can be found in Table 3. Even if these functions
are really easy applicable, some examples can be found later in section Usage examples. More detailed
information about the theoretical background of the algorithms can be found in the imputeTS manual
(Moritz, 2017b).

Function Option Description

[Link]
linear Imputation by Linear Interpolation
spline Imputation by Spline Interpolation
stine Imputation by Stineman Interpolation
[Link]
StructTS Imputation by Structural Model & Kalman Smoothing
[Link] Imputation by ARIMA State Space Representation & Kalman Sm.
[Link]
locf Imputation by Last Observation Carried Forward
nocb Imputation by Next Observation Carried Backward
[Link]
simple Missing Value Imputation by Simple Moving Average
linear Missing Value Imputation by Linear Weighted Moving Average
exponential Missing Value Imputation by Exponential Weighted Moving Average
[Link]
mean MissingValue Imputation by Mean Value
median Missing Value Imputation by Median Value
mode Missing Value Imputation by Mode Value
[Link] Missing Value Imputation by Random Sample
[Link] Replace Missing Values by a Defined Value
[Link] Seasonally Decomposed Missing Value Imputation
[Link] Seasonally Splitted Missing Value Imputation
[Link] Remove Missing Values

Table 3: Overview Imputation Algorithms

For convenience similar algorithms are available under one function name as parameter option. For
example linear, spline and stineman interpolation are all included in the [Link] function.
The [Link], [Link], [Link], [Link] functions are all simple and fast. In comparison,
[Link], [Link], [Link], [Link], [Link] are more advanced algorithms that
need more computation time. The [Link] function is a special case, since it only deletes all missing
values. Thus, it is not really an imputation function. It should be handled with care since removing
observations may corrupt the time information of the series. The [Link] and [Link] functions
are as well exceptions. These perform seasonal split / decomposition operations as a preprocessing
step. For the imputation itself, one out of the other imputation algorithms can be used (which one can
be set as option). Looking at all available imputation methods, no single overall best method can be
pointed out. Imputation performance is always very dependent on the characteristics of the input time
series. Even imputation with mean values can sometimes be an appropriate method. For time series
with a strong seasonality usually [Link] and [Link] / [Link] perform best. In general,
for most time series one algorithm out of [Link], [Link] and [Link] will yield the
best results. Meanwhile, [Link], [Link], [Link] will be at the lower end accuracy wise for the
majority of input time series.

Datasets

As can be seen in Table 4, all three datasets are available in a version with missing data and in a
complete version. The provided time series are designated as benchmark datasets for univariate time
series imputation. They shall enable users to quickly compare and test imputation algorithms. Without
these datasets the process of testing time series imputation algorithms would require to manually
delete certain observations. The benchmark data simplifies this: imputation algorithms can directly
be applied to the dataset versions with missing values, which then can be compared to the complete

The R Journal Vol. 9/1, June 2017 ISSN 2073-4859

C ONTRIBUTED RESEARCH ARTICLE 210

dataset versions afterwards. Since the time series are specified, researchers can use these to compare
their algorithms against each other.

Reached RMSE or MAPE values on these datasets are easily understandable results to quote and
compare against. Nevertheless, comparing algorithms using these fixed datasets can only be a first
indicator of how well algorithms perform in general. Especially for the very short tsAirgap series
(with just 13 NA values) random lucky guesses can considerably influence the results. A complete
benchmark would include: ’Different missing data percentages’, ’Different datasets’, ’Different random
seeds for missing data simulation’.

Overall there is a relatively small time series provided in tsAirgap, a medium one in tsNH4 and a large
time series in tsHeating. The tsHeating and tsNH4 are both sensor data, while tsAirgap is count data.

Dataset Description
tsAirgap Time series of monthly airline passengers (with NAs)
tsAirgapComplete Time series of monthly airline passengers (complete)
tsHeating Time series of a heating systems’ supply temperature (with NAs)
tsHeatingComplete Time series of a heating systems’ supply temperature (complete)
tsNH4 Time series of NH4 concentration in a waste-water system (with NAs)
tsNH4Complete Time series of NH4 concentration in a waste-water system (complete)

Table 4: Overview Datasets

tsAirgap
The tsAirgap time series has 144 rows and the incomplete version includes 14 NA values. It represents
the monthly totals of international airline passengers from 1949 to 1960. The time series originates from
Box et al. (2015) and is a commonly used example in time series analysis literature. Originally known
as ’AirPassengers’ or ’airpass’ this version is renamed to ’tsAirgap’ in order improve differentiation
from the complete series (gap signifies that NAs were introduced). The characteristics (strong trend,
strong seasonal behavior) make the tsAirgap series a great example for time series imputation.
As already mentioned in order to use this series for comparing imputation algorithm results, there
are two time series provided. One series without missing values (tsAirgapComplete), which can
be used as ground truth. Another series with NAs, on which the imputation algorithms can be
applied (tsAirgap). While the missing data for tsNH4 and tsHeating were each introduced according
to patterns observed in very similar time series from the same source, the missing observations in
tsAirgap were created based on general missing data patterns.

tsNH4
The tsNH4 time series has 4552 rows and the incomplete version includes 883 NA values. It represents
the NH4 concentration in a waste-water system measured from 30.11.2010 - 16:10 to 01.01.2011 -
6:40 in 10 minute steps. The time series is derived from the dataset of the Genetic and Evolutionary
Computation Conference (GECCO) Industrial Challenge 2014 1 .
As already mentioned in order to use this series for comparing imputation algorithm results, there are
two time series provided. One series without missing values (tsNH4Complete), which can be used as
ground truth. Another series with NAs (tsNH4), on which the imputation algorithms can be applied.
The pattern for the NA occurrence was derived from the same series / sensors, but from an earlier
time interval. Thus, it is a very realistic missing data pattern. Beware, since the time series has a lot of
observations, some of the more complex algorithms like [Link] will need some time till they are
finished.

tsHeating
The tsHeating time series has 606837 rows and the incomplete version includes 57391 NA values. It
represents a heating systems’ supply temperature measured from 18.11.2013 - [Link] to 13.01.2015 -
[Link] in 1 minute steps. The time series originates from the GECCO Industrial Challenge 2015 2 .
This was a challenge about ’Recovering missing information in heating system operating data’. Goal
was to impute missing values in heating system sensor data as accurate as possible.
As already mentioned in order to use this series for comparing imputation algorithm results, there are
two time series provided. One series without missing values (tsHeatingComplete), which can be used
1 [Link]
2 [Link]

The R Journal Vol. 9/1, June 2017 ISSN 2073-4859

C ONTRIBUTED RESEARCH ARTICLE 211

as ground truth. Another series with NAs (tsHeating), on which the imputation algorithms can be
applied. The NAs thereby were inserted according to patterns found in similar time series. According
to patterns found / occurring in other heating systems. Beware, since it is a very large time series,
some of the more complex algorithms like [Link] may need up to several days to complete on
standard hardware.

Usage examples
To start working with the imputeTS package, install either the stable version from CRAN or the de-
velopment version from GitHub ([Link] The stable version
from CRAN is hereby recommended.

Imputation algorithms

All imputation algorithms are used the same way. Input has to be either a numeric time series or a
numeric vector. As output, a version of the input data with all missing values replaced by imputed
values is returned. Here is a small example, to show how to use the imputation algorithms. (all
imputation functions start with na.’algorithm name’)

For this we first need to create an example input series with missing data.

# Create a short example time series with missing values

x <- ts(c(1, 2, 3, 4, 5, 6, 7, 8, NA, NA, 11, 12))

On this time series we can apply different imputation algorithms. We start with using [Link], which
substitutes the NAs with mean values.

# Impute the missing values with [Link]

[Link](x)

[1] 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 5.9 5.9 11.0 12.0

Most of the functions also have additional options that provide further algorithms (of the same
algorithm category). In the example below it can be seen that [Link] can also be called with
option="median", which substitutes the NAs with median values.

# Impute the missing values with [Link] using option median

[Link](x, option="median")

[1] 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 5.5 5.5 11.0 12.0

While [Link] and all other imputation functions are used the same way, the results
produced may be different. As can be seen below, for this series linear interpolation gives more
reasonable results.

# Impute the missing values with [Link]

[Link](x)

[1] 1 2 3 4 5 6 7 8 9 10 11 12

For longer and more complex time series (with trend and seasonality) than in this example it is always
a good idea to try [Link] and [Link], since these functions very often produce the best results.
These functions are called the same easy way as all other imputation functions.
Here is a usage example for the [Link] function applied on the tsAirgap (described in 2.2.4) time
series. As can be seen in Figure 1, [Link] provides really good results for this series, which contains
a strong seasonality and a strong trend.

# Impute the missing values with [Link]

# (tsAirgap is an example time series provided by the imputeTS package)
imp <- [Link](tsAirgap)

#Code for visualization

[Link](tsAirgap, [Link], tsAirgapComplete)

The R Journal Vol. 9/1, June 2017 ISSN 2073-4859

C ONTRIBUTED RESEARCH ARTICLE 212

Visualization Imputed Values

●
●

●
●
●
500
● ●
●
●● ● ●
● ● ●
●
● ●
●
● ● ● ●
●
Value

● ● ● ● ● ●
● ● ● ●
●
●
● ● ● ● ● ●
● ● ●
● ● ● ●
●● ● ●
300

● ● ●●●
●●● ● ●
●
● ● ● ● ●
●
●
●
● ●●● ● ● ● ●
● ●
● ●
●
● ●●●● ● ● ● ●
● ● ●
●
●
● ● ●
● ●
●● ●●● ●
● ● ●
● ●● ● ●
● ●●
●
●
●●
● ● ● ●
●● ●●
● ●● ●
●● ● ●● ●●●
●
100

● ● ● ●
●● ●● ●
●

1950 1952 1954 1956 1958 1960

Time
●
imputed values ●
real values ●
known values

Figure 1: Results of imputation with [Link] compared to real values

[Link]

This function visualizes the distribution of missing values within a time series. Therefore, the time
series is plotted and whenever a value is NA the background is colored differently. This gives a nice
overview, where in the time series most of the missing values occur. An example usage of the function
can be seen below (for the plot see Figure 2).

# Example Code '[Link]'

# (tsAirgap is an example time series provided by the imputeTS package)

# Visualize the missing values in this time series

[Link](tsAirgap)

Distribution of NAs

●
●

●
●
●
500

● ●
●
●● ●
● ● ●
● ●
● ● ● ●
Value

● ● ● ● ●
● ● ●
●
● ● ● ● ●
● ● ●
● ● ● ●
●● ● ●
300

● ● ● ●
● ● ● ●
●
●
●
● ●●● ● ● ● ●
● ●

●
● ●●●● ● ● ● ●
● ● ●
●
●
● ● ●
● ●
●● ●
● ● ●●● ●
● ●● ● ●
● ●●
●
●
●● ● ● ●
●● ●●
● ●● ●
●● ● ● ●
100

●● ● ●●● ●
●

1950 1952 1954 1956 1958 1960

Time

Figure 2: Example for [Link]

As can be seen in Figure 2, in areas with missing data the background is colored red. The whole plot is
pretty much self-explanatory. The plotting function itself needs no further configuration parameters,

The R Journal Vol. 9/1, June 2017 ISSN 2073-4859

C ONTRIBUTED RESEARCH ARTICLE 213

nevertheless it allows passing through of plot parameters (via ...).

[Link]

This function also visualizes the distribution of missing values within a time series. This is done as
a barplot, which is especially useful if the time series would otherwise be too large to be plotted.
Multiple observations for time intervals are grouped together and represented as bars. For these
intervals, information about the amount of missing values are shown. An example usage of the
function can be seen below (for the plot see Figure 3).

# Example Code '[Link]'

# (tsHeating is an example time series provided by the imputeTS package)

# Visualize the missing values in this time series

[Link](tsHeating, breaks = 20)

Distribution of NAs
1.0
0.8
Percentage

0.6
0.4
0.2
0.0

1 60685 151711 242737 333763 424789 515815 606837

Time
NAs non−NAs

Figure 3: Example for [Link]

As can be seen in the x-axis of Figure 3, the tsHeating series is with over 600.000 observations a very
large time series. While the missing values in the tsAirgap series (144 observations) can be visualized
with [Link] like in Figure 2, this would for sure not work out for tsHeating. There
just isn’t enough space for 600.000 single consecutive observations/points in the plotting area. The
[Link] function solves this problem. Multiple observations are grouped together in
intervals. The ’breaks’ parameter in the example defines that there should be 20 intervals used. This
means every interval in Figure 3 represents approximately 30.000 observations. The first five intervals
are completely green, which means there are no missing values present. This means from observation
1 up to observation 150.000 there are no missing values in the data. In the middle and at the end of
the series there are several intervals each having around 40% of missing data. This means in these
intervals 12.000 out of 30.000 observation are NA. All in all, the plot is able to give a nice but rough
overview about the NA distribution in very large time series.

[Link]

This plotting function can be used to visualize how often different NA gaps (NAs in a row) occur
in a time series. The function shows this information as a ranking. This ranking can be ordered by

The R Journal Vol. 9/1, June 2017 ISSN 2073-4859

C ONTRIBUTED RESEARCH ARTICLE 214

total NAs gap sizes account for (number occurrence gap size * gap length) or just by the number of
occurrences of gap sizes. In the end the results can be read like this: In time series x, 3 NAs in a row
occur most often with 20 occurrences, 6 NAs in a row occur 2nd most with 5 occurrences, 2 NAs in a
row occur 3rd most with 3 occurrences. An example usage of the function can be seen below(for the
plot see Figure 4).

# Example Code '[Link]'

# (tsNH4 is an example time series provided by the imputeTS package)

# Visualize the top gap sizes / NAs in a row

[Link](tsNH4)

Occurance of gapsizes (NAs in a row)

150

100
Number

0
27 NAs 32 NAs 4 NAs 5 NAs 3 NAs 2 NAs 1 NAs 42 NAs 91 NAs 157 NAs

Ranking of the different gapsizes

●
Num occurence gapsize ● Total NAs for gapsize

Figure 4: Example for [Link]

The example plot (Figure 4) reads the following: In the time series tsNH4 gap size 157 occurs just 1
time, but makes up for most NAs of all gap sizes (157 NAs). A gap size of 91 (91 NAs in a row) also
occurs just once, but makes up for 2nd most NAs (91 NAs). A gap size of 42 occurs two times in
the time series, which leads to 3rd most overall (84 NAs). A gap size of one (no other NAs before or
behind the NA) occurs 68 times, which makes this 4th in overall NAs (68 NAs).

[Link]

This plot can be used, to visualize the imputed values for a time series. Therefore, the imputed values
(filled NA gaps) are shown in a different color than the other values. The function is used as below
and Figure 5 shows the output.

# Example Code '[Link]'

# (tsAirgap is an example time series provided by the imputeTS package)

# Step 1: Perform imputation for x using [Link]

[Link] <- [Link](tsAirgap)

# Step 2: Visualize the imputed values in the time series

[Link](tsAirgap, [Link])

The R Journal Vol. 9/1, June 2017 ISSN 2073-4859

C ONTRIBUTED RESEARCH ARTICLE 215

The visual inspection of Figure 5 indicates, that the imputed values (red) do not fit very well in the
tsAirgap series. This is caused by [Link] being used for imputation of a series with a strong trend.
The plotting function enables users to quickly detect such problems in the imputation results. If the
ground truth is known for the imputed values, this information can also be added to the plot. The
plotting function itself needs no further configuration parameters. Nevertheless, it allows passing
through of plot parameters (via ...).

Visualization Imputed Values

●
●

●
●
●
500

● ●
●
●● ●
● ● ●
● ●
● ● ● ●
Value

● ● ● ● ●
● ● ●
●
● ● ● ● ●
● ● ●
● ● ● ●
●● ● ●
300

● ● ● ●
● ● ● ●
●
●
● ● ● ● ● ●
● ●●● ● ● ●●●● ●
● ● ● ● ●
● ●
●
● ●●●● ● ● ● ●
● ● ●
●
●
● ● ●
● ●
●● ●
● ● ●●● ●
● ●● ● ●
● ●●
●
●
●● ● ● ●
●● ●●
● ●● ●
●● ● ● ●
100

●● ● ●●● ●
●

1950 1952 1954 1956 1958 1960

Time
●
imputed values ●
known values

Figure 5: Example for [Link]

statsNA

The statsNA function prints summary stats about the distribution of missing values in univariate time
series. Here is a short explanation about the information it gives:
• Length of time series
Number of observations in the time series (including NAs)
• Number of Missing Values
Number of missing values in the time series
• Percentage of Missing Values Percentage of missing values in the time series
• Stats for Bins
Number/percentage of missing values for the split into bins
• Longest NA gap
Longest series of consecutive missing values (NAs in a row) in the time series
• Most frequent gap size
Most frequent occurring series of missing values in the time series
• Gap size accounting for most NAs
he series of consecutive missing values that accounts for most missing values overall in the time
series
• Overview NA series
Overview about how often each series of consecutive missing values occurs. Series occurring 0
times are skipped
The function is used as below and Figure 6 shows the output.

# Example Code 'statsNA'

# (tsNH4 is an example time series provided by the imputeTS package)

# Print stats about the missing data

statsNA(tsNH4)

The R Journal Vol. 9/1, June 2017 ISSN 2073-4859

C ONTRIBUTED RESEARCH ARTICLE 216

Figure 6: Excerpt of statsNA output

Datasets

Using the datasets is self-explanatory, after the package is loaded they are directly available and usable
under their name. No call of data() is needed. For every dataset there is always a complete version
(without NAs) and an incomplete version (containing NAs) available.

# Example Code to use tsAirgap dataset

library("imputeTS")
tsAirgap

Figure 7: Example tsAirgap time series

The R Journal Vol. 9/1, June 2017 ISSN 2073-4859

C ONTRIBUTED RESEARCH ARTICLE 217

Conclusions
Missing data is a very common problem for all kinds of data. However, in case of univariate time
series most standard algorithms and existing functions within R packages cannot be applied.
This paper presented the imputeTS package that provides a collection of algorithms and tools espe-
cially tailored to this task. Using example time series, we illustrated the ease of use and the advantages
of the provided functions. Simple algorithms as well as more complicated ones can be applied in the
same simple and user-friendly manner.

The functionality provided makes the imputeTS package a good choice for preprocessing of time
series ahead of further analysis steps that require complete absence of missing values.

Future research and development plans for forthcoming versions of the package include adding
additional time series algorithm options to choose from.

Acknowledgment
Parts of this work have been developed in the project ’IMProvT: Intelligente Messverfahren zur Prozessopti-
mierung von Trinkwasserbereitstellung und -verteilung’ (reference number: 03ET1387A). Kindly supported
by the Federal Ministry of Economic Affairs and Energy of the Federal Republic of Germany.

Bibliography
Z. Bar-Joseph, G. K. Gerber, D. K. Gifford, T. S. Jaakkola, and I. Simon. Continuous representations
of time-series gene expression data. Journal of Computational Biology, 10(3-4):341–356, 2003. URL
[Link] [p207]

R. Billinton, H. Chen, and R. Ghajar. Time-series models for reliability evaluation of power systems
including wind energy. Microelectronics Reliability, 36(9):1253–1261, 1996. URL [Link]
1016/0026-2714(95)00154-9. [p207]

G. E. Box, G. M. Jenkins, G. C. Reinsel, and G. M. Ljung. Time Series Analysis: Forecasting and Control.
John Wiley & Sons, 2015. [p210]

A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM
algorithm. Journal of the royal statistical society. Series B (methodological), pages 1–38, 1977. [p207]

B. L. Ford. An Overview of Hot-Deck Procedures. Incomplete data in sample surveys, 2(Part IV):185–207,
1983. [p207]

J. M. Gottman. Time-Series Analysis: A Comprehensive Introduction for Social Scientists, volume 400.
Cambridge University Press Cambridge, 1981. [p207]

J. Honaker, G. King, and M. Blackwell. Amelia II: A program for missing data. Journal of Statistical
Software, 45(7):1–47, 2011. URL [Link] [p207]

R. J. Hyndman. forecast: Forecasting Functions for Time Series and Linear Models, 2017. URL http:
//[Link]/robjhyndman/forecast. R package version 8.0. [p207]

The R Journal Vol. 9/1, June 2017 ISSN 2073-4859

C ONTRIBUTED RESEARCH ARTICLE 218

J. Josse and F. Husson. missMDA: A package for handling missing values in multivariate data analysis.
Journal of Statistical Software, 70(1):1–31, 2016. URL [Link]
[p207]

A. Kowarik and M. Templ. Imputation with the R package VIM. Journal of Statistical Software, 74(7):
1–16, 2016. URL [Link] [p207]

S. Moritz. imputeTS: Time Series Missing Value Imputation, 2017a. URL [Link]
package=imputeTS. R package version 2.3. [p207]

S. Moritz. Package imputeTS, 2017b. URL [Link]

[Link]. R package version 2.3. [p209]

S. Moritz, A. Sard á, T. Bartz-Beielstein, M. Zaefferer, and J. Stork. Comparison of different methods

for univariate time series imputation in R. ArXiv e-prints, 2015. [p207]

E. Pebesma. spacetime: Spatio-temporal data in R. Journal of Statistical Software, 51(7):1–30, 2012. URL
[Link] [p207]

Rmetrics Core Team, D. Wuertz, T. Setz, and Y. Chalabi. timeSeries: Rmetrics - Financial Time Series Ob-
jects, 2015. URL [Link] R package version 3022.101.2.
[p207]

D. B. Rubin. Multiple Imputation for Nonresponse in Surveys. John Wiley & Sons, 1987. URL https:
//[Link]/10.1002/9780470316696. [p207]

J. A. Ryan and J. M. Ulrich. xts: eXtensible Time Series, 2014. URL [Link]
package=xts. R package version 0.9-7. [p207]

S. J. Taylor. Modelling Financial Time Series (Second Edition). World Scientific Publishing, 2007. URL
[Link] [p207]

P. Vacek and T. Ashikaga. An examination of the nearest neighbor rule for imputing missing values.
Proc. Statist. Computing Sect., Amer. Statist. Ass, pages 326–331, 1980. [p207]

S. van Buuren and K. Groothuis-Oudshoorn. mice: Multivariate imputation by chained equations in

R. Journal of Statistical Software, 45(3):1–67, 2011. URL [Link]
[p207]

A. Zeileis and G. Grothendieck. zoo: S3 infrastructure for regular and irregular time series. Journal of
Statistical Software, 14(6):1–27, 2005. URL [Link] [p207]

Steffen Moritz
Cologne University of Applied Sciences
Cologne, Germany
steffen.moritz10@[Link]

Thomas Bartz-Beielstein
Cologne University of Applied Sciences
Cologne, Germany
[Link]@[Link]

The R Journal Vol. 9/1, June 2017 ISSN 2073-4859

Univariate Time Series Imputation in R
No ratings yet
Univariate Time Series Imputation in R
20 pages
Time Series Missing Value Imputation
No ratings yet
Time Series Missing Value Imputation
29 pages
Time Series Data Handling Techniques
No ratings yet
Time Series Data Handling Techniques
9 pages
Univariate Time Series Imputation Methods
No ratings yet
Univariate Time Series Imputation Methods
7 pages
Seasonal Time-Series Imputation Algorithm
No ratings yet
Seasonal Time-Series Imputation Algorithm
20 pages
R Data Imputation Techniques Guide
No ratings yet
R Data Imputation Techniques Guide
16 pages
Imputation Methods for Missing Time Series Data
No ratings yet
Imputation Methods for Missing Time Series Data
16 pages
Imputation Methods for Time Series Data
No ratings yet
Imputation Methods for Time Series Data
13 pages
Recurrent Neural Networks For Multivariate Time Series With Missing Values
No ratings yet
Recurrent Neural Networks For Multivariate Time Series With Missing Values
12 pages
Missing Data Statistical Analysis Guide
No ratings yet
Missing Data Statistical Analysis Guide
71 pages
Flexible Imputation of Missing Data 1st Edition Stef Van Buuren Full Digital Chapters
No ratings yet
Flexible Imputation of Missing Data 1st Edition Stef Van Buuren Full Digital Chapters
112 pages
Flexible Imputation of Missing Data 1st Edition Stef Van Buuren No Waiting Time
No ratings yet
Flexible Imputation of Missing Data 1st Edition Stef Van Buuren No Waiting Time
106 pages
Time Series Missing Value Imputation
No ratings yet
Time Series Missing Value Imputation
40 pages
SVD for Missing Data Imputation Report
No ratings yet
SVD for Missing Data Imputation Report
6 pages
Time Series and Survival Analysis Overview
No ratings yet
Time Series and Survival Analysis Overview
8 pages
GRU-D: RNN for Time Series Missingness
No ratings yet
GRU-D: RNN for Time Series Missingness
14 pages
Interpolating Missing Values in R Time Series
No ratings yet
Interpolating Missing Values in R Time Series
3 pages
Imputing Missing Values in R
No ratings yet
Imputing Missing Values in R
478 pages
Handling Missing Data in Research
No ratings yet
Handling Missing Data in Research
58 pages
GMA Algorithm for Time Series Imputation
No ratings yet
GMA Algorithm for Time Series Imputation
38 pages
Understanding Multivariate Time Series
No ratings yet
Understanding Multivariate Time Series
51 pages
Handling Missing Values in Python
No ratings yet
Handling Missing Values in Python
9 pages
Imputation Techniques for Time-Series Data
No ratings yet
Imputation Techniques for Time-Series Data
42 pages
Time Series Missing Value Imputation
No ratings yet
Time Series Missing Value Imputation
37 pages
Outlier Detection in Time Series with tsoutliers
No ratings yet
Outlier Detection in Time Series with tsoutliers
32 pages
Missing Data in Statistical Learning
No ratings yet
Missing Data in Statistical Learning
23 pages
Imputation Techniques for Missing Data
No ratings yet
Imputation Techniques for Missing Data
15 pages
Handling Missing Values in Data Analysis
No ratings yet
Handling Missing Values in Data Analysis
20 pages
Time Series Regression Analysis in R
100% (1)
Time Series Regression Analysis in R
20 pages
Time Series Forecasting with ARIMA in R
100% (2)
Time Series Forecasting with ARIMA in R
26 pages
Time Series Regression Analysis Basics
No ratings yet
Time Series Regression Analysis Basics
80 pages
GMA: Time Series Missing Value Imputation
No ratings yet
GMA: Time Series Missing Value Imputation
20 pages
R Time Series Analysis Techniques
No ratings yet
R Time Series Analysis Techniques
36 pages
Time Series Analysis Cheat Sheet
No ratings yet
Time Series Analysis Cheat Sheet
2 pages
Overview of Univariate Time Series Analysis
No ratings yet
Overview of Univariate Time Series Analysis
67 pages
Bidirectional Mean Distance Estimation Method
No ratings yet
Bidirectional Mean Distance Estimation Method
6 pages
Imputation Techniques in R's VIM Package
No ratings yet
Imputation Techniques in R's VIM Package
16 pages
Mastering Data Imputation Techniques
No ratings yet
Mastering Data Imputation Techniques
26 pages
Regression vs. Box-Jenkins Analysis
No ratings yet
Regression vs. Box-Jenkins Analysis
14 pages
Package CausalImpact - CausalImpact
No ratings yet
Package CausalImpact - CausalImpact
8 pages
Time Series Analysis with R Guide
No ratings yet
Time Series Analysis with R Guide
20 pages
Time Series Analysis: Decomposition & Interpolation
No ratings yet
Time Series Analysis: Decomposition & Interpolation
26 pages
4550Multiple Imputation in Practice Using IVEware First Edition Berglund eBook fast cloud download
100% (2)
4550Multiple Imputation in Practice Using IVEware First Edition Berglund eBook fast cloud download
63 pages
Time-Series Commands Overview Guide
100% (1)
Time-Series Commands Overview Guide
6 pages
Time Series Analysis in R Guide
No ratings yet
Time Series Analysis in R Guide
13 pages
Imputation Methods Overview by Patilea
No ratings yet
Imputation Methods Overview by Patilea
32 pages
Stata Date Variable Management Guide
No ratings yet
Stata Date Variable Management Guide
33 pages
R Package for Multiple Imputation Diagnostics
No ratings yet
R Package for Multiple Imputation Diagnostics
31 pages
Flexible Imputation of Missing Data
100% (3)
Flexible Imputation of Missing Data
444 pages
Time Series Analysis in R: Methods & Examples
No ratings yet
Time Series Analysis in R: Methods & Examples
17 pages
R Time Series ts Object Guide
No ratings yet
R Time Series ts Object Guide
8 pages
R Packages for Multiple Imputation
No ratings yet
R Packages for Multiple Imputation
4 pages
Psychosocial Dimensions of Gender & Sexuality
No ratings yet
Psychosocial Dimensions of Gender & Sexuality
27 pages
Lintech - CIVIL Profile
No ratings yet
Lintech - CIVIL Profile
9 pages
Hematological Effects of Moringa on Ducks
No ratings yet
Hematological Effects of Moringa on Ducks
8 pages
CBSE 2025 XII Mathematics Paper
No ratings yet
CBSE 2025 XII Mathematics Paper
24 pages
Project-Global Temperature Change-Student Guide
0% (1)
Project-Global Temperature Change-Student Guide
4 pages
Notas Finais dos Alunos
No ratings yet
Notas Finais dos Alunos
1 page
Petani River Water Quality Assessment
100% (1)
Petani River Water Quality Assessment
17 pages
SMIC Acquires Majority in AIC Group
No ratings yet
SMIC Acquires Majority in AIC Group
3 pages
Felixstowe Academy Prospectus Overview
No ratings yet
Felixstowe Academy Prospectus Overview
7 pages
Nikon D300 Tips and Settings Guide
No ratings yet
Nikon D300 Tips and Settings Guide
1 page
Mastering Iteration with purrr in R
No ratings yet
Mastering Iteration with purrr in R
81 pages
Class 12 Relations & Functions Questions
No ratings yet
Class 12 Relations & Functions Questions
8 pages
Weis Wave Trading Setups Explained
100% (10)
Weis Wave Trading Setups Explained
15 pages
McConnell 22e Microeconomics Testbank
No ratings yet
McConnell 22e Microeconomics Testbank
30 pages
Bryan K Orme-Getting Started With Conjoint Analysis - Strategies For Product Design and Pricing Research-Research Publishers, LLC (2009)
100% (3)
Bryan K Orme-Getting Started With Conjoint Analysis - Strategies For Product Design and Pricing Research-Research Publishers, LLC (2009)
115 pages
Types of Fire Extinguishers Explained
No ratings yet
Types of Fire Extinguishers Explained
6 pages
English Language Learning Activities
No ratings yet
English Language Learning Activities
57 pages
Additional Maths Expansion Problems
No ratings yet
Additional Maths Expansion Problems
2 pages
English Exam Practice for Grade 9
No ratings yet
English Exam Practice for Grade 9
5 pages
Silver Coated Aluminum Fluorosilicone Data Sheet
No ratings yet
Silver Coated Aluminum Fluorosilicone Data Sheet
1 page
Workplace Environment & Employee Engagement
No ratings yet
Workplace Environment & Employee Engagement
33 pages
Review of "The Art of Relationship Sales"
100% (2)
Review of "The Art of Relationship Sales"
11 pages
ITSM 365 API Guide and Examples
No ratings yet
ITSM 365 API Guide and Examples
14 pages
Night Has Come: Korean Thriller Series
No ratings yet
Night Has Come: Korean Thriller Series
1 page
Well Design - 3D: ENM210 Drilling Technology
No ratings yet
Well Design - 3D: ENM210 Drilling Technology
22 pages
LinkedIn Marketing Agency Guide
No ratings yet
LinkedIn Marketing Agency Guide
21 pages
List of Government Technical Institutes in Uganda
No ratings yet
List of Government Technical Institutes in Uganda
14 pages
BBA Operations Management Exam 2025
No ratings yet
BBA Operations Management Exam 2025
2 pages
WCP HOA Board Meeting Minutes - Jan 2004
No ratings yet
WCP HOA Board Meeting Minutes - Jan 2004
5 pages
Understanding Cardiac Arrhythmias and EKGs
No ratings yet
Understanding Cardiac Arrhythmias and EKGs
31 pages

imputeTS: Univariate Time Series Imputation

Uploaded by

imputeTS: Univariate Time Series Imputation

Uploaded by

C ONTRIBUTED RESEARCH ARTICLE 207

imputeTS: Time Series Missing Value

Overview imputeTS package

The R Journal Vol. 9/1, June 2017 ISSN 2073-4859

Simple Imputation Imputation Plots & Statistics Datasets

Table 1: General Overview imputeTS package

Plots & statistics functions

Table 2: Overview Plots & Statistics

The R Journal Vol. 9/1, June 2017 ISSN 2073-4859

Function Option Description

Table 3: Overview Imputation Algorithms

The R Journal Vol. 9/1, June 2017 ISSN 2073-4859

Table 4: Overview Datasets

The R Journal Vol. 9/1, June 2017 ISSN 2073-4859

# Create a short example time series with missing values

# Impute the missing values with [Link]

# Impute the missing values with [Link] using option median

# Impute the missing values with [Link]

# Impute the missing values with [Link]

#Code for visualization

The R Journal Vol. 9/1, June 2017 ISSN 2073-4859

Visualization Imputed Values

1950 1952 1954 1956 1958 1960

Figure 1: Results of imputation with [Link] compared to real values

# Example Code '[Link]'

# Visualize the missing values in this time series

1950 1952 1954 1956 1958 1960

Figure 2: Example for [Link]

The R Journal Vol. 9/1, June 2017 ISSN 2073-4859

nevertheless it allows passing through of plot parameters (via ...).

# Example Code '[Link]'

# Visualize the missing values in this time series

1 60685 151711 242737 333763 424789 515815 606837

Figure 3: Example for [Link]

The R Journal Vol. 9/1, June 2017 ISSN 2073-4859

# Example Code '[Link]'

# Visualize the top gap sizes / NAs in a row

Occurance of gapsizes (NAs in a row)

Ranking of the different gapsizes

Figure 4: Example for [Link]

# Example Code '[Link]'

# Step 1: Perform imputation for x using [Link]

# Step 2: Visualize the imputed values in the time series

The R Journal Vol. 9/1, June 2017 ISSN 2073-4859

Visualization Imputed Values

1950 1952 1954 1956 1958 1960

Figure 5: Example for [Link]

# Example Code 'statsNA'

# Print stats about the missing data

The R Journal Vol. 9/1, June 2017 ISSN 2073-4859

Figure 6: Excerpt of statsNA output

# Example Code to use tsAirgap dataset

Figure 7: Example tsAirgap time series

The R Journal Vol. 9/1, June 2017 ISSN 2073-4859

The R Journal Vol. 9/1, June 2017 ISSN 2073-4859

S. Moritz. Package imputeTS, 2017b. URL [Link]

S. Moritz, A. Sard á, T. Bartz-Beielstein, M. Zaefferer, and J. Stork. Comparison of different methods

S. van Buuren and K. Groothuis-Oudshoorn. mice: Multivariate imputation by chained equations in

The R Journal Vol. 9/1, June 2017 ISSN 2073-4859

You might also like