Collection of Time-Series Classification Datasets with Pretrained Deep Models and SHAP, LIME & Anchor Explanations

Mozolewski, Maciej; Bobek, Szymon; Nalepa, Grzegorz J.

doi:10.5281/zenodo.15173401

Published April 8, 2025 | Version v1

Dataset Open

Collection of Time-Series Classification Datasets with Pretrained Deep Models and SHAP, LIME & Anchor Explanations

1. Jagiellonian University

A benchmark suite for reproducible time-series XAI research

Precomputed boundle of post-hoc explanations and black-box models for time-series classification

This dataset contains 83 univariate and 20 multivariate time series datasets from the TSC repository, each used for multiclass classification with a deep learning model. For each dataset, we provide:

• Precomputed train/test splits
• A trained TensorFlow model
• Post-hoc local explanations generated using three methods: SHAP, LIME, and Anchor

Impact of the dataset

These datasets provide a ready-to-use benchmark for Explainable AI in time series classification. Since models and explanation outputs are precomputed, researchers can immediately use them for evaluation, visualization, or developing new post-hoc XAI techniques. This reduces the overhead of retraining or re-explaining models, supports reproducibility, and enables systematic comparisons across explanation methods.

Comprehensive Coverage: 83 univariate and 20 multivariate UCR/UEA time-series, each with a standardized 75/25 train–test split in NumPy .pickle format.

Pretrained Models: Ready-to-use ConvLSTM1D TensorFlow models for every dataset, eliminating costly training and ensuring experimental consistency.

Precomputed Explanations: Post-hoc outputs for training and test sets from KernelSHAP, LIME, and Anchor, including attribution scores and rule sets with confidence and coverage.

Open Data and Code: All files are CC-BY-4.0, and a linked GitHub repository provides Jupyter notebooks for loading data, inspecting outputs, and applying XAI methods.

Repository content

- `train_test.zip` — contains files of the form `{univariate|multivariate}_{series_name}_train_and_test.zip`
Each includes: `trainX.pickle`, `trainy.pickle`, `testX.pickle`, `testy.pickle`
Format: `numpy.array`

- `models.zip` — trained models as directories in the form `{univariate|multivariate}_{series_name}_model_tf.zip`
Each contains a TensorFlow SavedModel. Format: `numpy.array`

- `shap.zip` — SHAP values for each dataset in `{series_name}_shap_values.zip`
Files: `svtr.pickle` (train), `svts.pickle` (test). Format: `numpy.array`

- `lime.zip` — LIME values for each dataset in `{series_name}_lime_values.zip`
Files: `lvtr.pickle` (train), `lvts.pickle` (test). Format: `numpy.array`

- `anchor.zip` — rule-based Anchor explanations per dataset in `{series_name}_anchor_values.zip`
Files: `avtr.pickle` (train), `avts.pickle` (test). Format: `List[List[Dictionary]]`

Technical info (English)

Model Training

The time series classification model was trained using a deep neural architecture based on stacked ConvLSTM1D layers. Input data consisted of time series samples of shape (T, F), where T denotes the number of timesteps and F the number of features. The same model architecture was used for both univariate and multivariate input; in the univariate case, the number of features F = 1. Labels were one-hot encoded, producing output vectors of length equal to the number of classes. Prior to training, the data was normalized using StandardScaler, and any missing values were imputed with zeros.

To facilitate temporal feature learning, the input sequences were divided into smaller temporal blocks. Specifically, each sequence of length T was segmented into n_steps parts, where n_steps corresponds to the third smallest integer divisor of T (excluding 1 and 2). The segment length was then computed as n_length = T / n_steps, resulting in a reshaped input of shape (n_steps, n_length, F). This restructuring enables the model to capture both local patterns within segments and long-range dependencies across segments.

The model architecture includes two ConvLSTM1D layers with 64 and 32 filters, respectively, each using a kernel size of 9 and ReLU activation. A dropout layer with a rate of 0.5 follows for regularization, and the output is flattened to create an intermediate embedding representation. This is followed by two fully connected layers: one with 100 ReLU units and another with softmax activation for classification. Class imbalance was addressed using computed class weights during training. The model was trained using the Adam optimizer and categorical cross-entropy loss for 25 epochs with a batch size of 64.

The architecture of the models used for training different datasets is presented below. The number of trainable parameters depends on the shapes of the input.

Layer (type) Output Shape
=================================================================
reshape (Reshape) (None, n_steps, n_length, F)

conv_lstm1d (ConvLSTM1D) (None, n_steps, n_length, 64)

conv_lstm1d_1 (ConvLSTM1D) (None, n_steps, n_length, 32)

dropout (Dropout) (None, n_steps, n_length, 32)

embedding (Flatten) (None, 1472)

dense (Dense) (None, 100)

dense_1 (Dense) (None, n_classes)

=================================================================

The dataset was split into training and testing sets using a standard random partitioning strategy with a 75:25 ratio. This means that 75% of the samples were used for model training, while the remaining 25% were held out for evaluation. Stratified sampling was applied to preserve class distribution across both sets. Labels were one-hot encoded to support categorical cross-entropy loss during training.

Explanations

The explanations were computed for both training and test subsets. All indices in the explanation files are aligned with the respective train/test instances.

Explanation coverage:
- Univariate: Anchor (39.75%), LIME (100.00%), SHAP (100.00%), all three (39.75%)
- Multivariate: Anchor (100.00%), LIME (95.00%), SHAP (100.00%), all three (95.00%)

Anchor:

Each explanation is a list of rule sets grouped per instance, e.g.:

[
[
{
'index': 0,
'success': True,
'prediction': '1',
'rule': {
'feature_1': ['>-0.74'],
'feature_4': ['<=0.73'],
'feature_11': ['>-0.74'],
'feature_42': ['<=0.73'],
'feature_110': ['>-0.74']
},
'confidence': 0.9565,
'coverage': 0.7197
}
]
]

Each entry corresponds to a sample index. The `rule` defines a conjunction of feature constraints satisfied by the sample. `confidence` measures the fraction of samples fulfilling the rule for which the model gives the same prediction, while `coverage` is the fraction of the train/test set satisfying the rule.

Accompanying GitHub repository

Code used for generation:
- Training code: https://github.com/mozo64/uci-time-series-xai-benchmark/blob/main/notebooks/UCI-workflow-train-balckbox-model.ipynb
- SHAP & LIME: https://github.com/mozo64/metro3/blob/main/notebooks/UCI-workflow-benchmark-2.ipynb
- Anchor: https://github.com/mozo64/metro3/blob/main/notebooks/UCI-workflow-benchmark-4-anchor.ipynb
- Demostration how to read the data: https://github.com/mozo64/uci-time-series-xai-benchmark/blob/main/notebooks/UCI-workflow-how-to.ipynb

List of all Time series

Multivariate:

| Time Series | Model Accuracy | Anchor | LIME | SHAP |
| :------------------------ | :------------- | :----- | :--- | :--- |
| ArticularyWordRecognition | 96,53 | Yes | Yes | Yes |
| AtrialFibrillation | 37,50 | Yes | Yes | Yes |
| BasicMotions | 100,00 | Yes | Yes | Yes |
| Cricket | 93,33 | Yes | Yes | Yes |
| Epilepsy | 91,30 | Yes | Yes | Yes |
| ERing | 98,67 | Yes | Yes | Yes |
| EthanolConcentration | 22,14 | Yes | Yes | Yes |
| FaceDetection | 50,25 | Yes | No | Yes |
| FingerMovements | 57,69 | Yes | Yes | Yes |
| HandMovementDirection | 23,73 | Yes | Yes | Yes |
| Handwriting | 58,40 | Yes | Yes | Yes |
| Heartbeat | 72,82 | Yes | Yes | Yes |
| Libras | 63,33 | Yes | Yes | Yes |
| LSST | 22,16 | Yes | Yes | Yes |
| NATOPS | 86,67 | Yes | Yes | Yes |
| PenDigits | 99,24 | Yes | Yes | Yes |
| RacketSports | 82,89 | Yes | Yes | Yes |
| SelfRegulationSCP1 | 89,36 | Yes | Yes | Yes |
| SelfRegulationSCP2 | 45,26 | Yes | Yes | Yes |
| UWaveGestureLibrary | 94,55 | Yes | Yes | Yes |

Univariate:

| Time Series | Model Accuracy | Anchor | LIME | SHAP |
|----------------------------------|------------------|----------|--------|--------|
| Adiac | 29.59 | No | Yes | Yes |
| Beef | 53.33 | No | Yes | Yes |
| BeetleFly | 80.00 | Yes | Yes | Yes |
| BirdChicken | 50.00 | No | Yes | Yes |
| BME | 86.67 | No | Yes | Yes |
| CBF | 100.00 | No | Yes | Yes |
| Chinatown | 98.90 | Yes | Yes | Yes |
| Coffee | 53.33 | No | Yes | Yes |
| Computers | 53.33 | No | Yes | Yes |
| CricketX | 61.54 | Yes | Yes | Yes |
| CricketY | 60.51 | Yes | Yes | Yes |
| CricketZ | 62.56 | Yes | Yes | Yes |
| Crop | 76.78 | No | Yes | Yes |
| DiatomSizeReduction | 100,00 | Yes | Yes | Yes |
| DistalPhalanxOutlineAgeGroup | 77,78 | Yes | Yes | Yes |
| DistalPhalanxOutlineCorrect | 78,08 | Yes | Yes | Yes |
| DistalPhalanxTW | 66,67 | No | Yes | Yes |
| DodgerLoopDay | 55,00 | Yes | Yes | Yes |
| DodgerLoopGame | 87,50 | Yes | Yes | Yes |
| DodgerLoopWeekend | 97,50 | No | Yes | Yes |
| Earthquakes | 68,10 | No | Yes | Yes |
| ECG200 | 86,00 | Yes | Yes | Yes |
| ECG5000 | 91,52 | Yes | Yes | Yes |
| ECGFiveDays | 100,00 | Yes | Yes | Yes |
| ElectricDevices | 85,55 | No | Yes | Yes |
| FaceFour | 96,43 | Yes | Yes | Yes |
| FiftyWords | 63,00 | No | Yes | Yes |
| FordA | 86,84 | Yes | Yes | Yes |
| FordB | 85,61 | No | Yes | Yes |
| FreezerRegularTrain | 99,87 | No | Yes | Yes |
| FreezerSmallTrain | 99,44 | No | Yes | Yes |
| Fungi | 0,00 | Yes | Yes | Yes |
| GunPoint | 86,00 | No | Yes | Yes |
| GunPointAgeSpan | 88,50 | Yes | Yes | Yes |
| GunPointMaleVersusFemale | 100,00 | No | Yes | Yes |
| GunPointOldVersusYoung | 100,00 | No | Yes | Yes |
| Herring | 56,25 | No | Yes | Yes |
| InsectWingbeatSound | 66,91 | No | Yes | Yes |
| ItalyPowerDemand | 96,35 | No | Yes | Yes |
| LargeKitchenAppliances | 64,89 | No | Yes | Yes |
| Lightning2 | 58,06 | No | Yes | Yes |
| Lightning7 | 66,67 | No | Yes | Yes |
| Meat | 63,33 | Yes | Yes | Yes |
| MedicalImages | 68,53 | Yes | Yes | Yes |
| MiddlePhalanxOutlineAgeGroup | 78,42 | Yes | Yes | Yes |
| MiddlePhalanxOutlineCorrect | 69,51 | No | Yes | Yes |
| MiddlePhalanxTW | 64,75 | No | Yes | Yes |
| MoteStrain | 95,60 | No | Yes | Yes |
| OliveOil | 13,33 | Yes | Yes | Yes |
| OSULeaf | 56,76 | No | Yes | Yes |
| PhalangesOutlinesCorrect | 67,07 | Yes | Yes | Yes |
| Plane | 96,23 | Yes | Yes | Yes |
| PowerCons | 100,00 | Yes | Yes | Yes |
| ProximalPhalanxOutlineAgeGroup | 74,34 | No | Yes | Yes |
| ProximalPhalanxOutlineCorrect | 73,09 | No | Yes | Yes |
| ProximalPhalanxTW | 48,68 | Yes | Yes | Yes |
| RefrigerationDevices | 43,62 | No | Yes | Yes |
| ScreenType | 40,96 | No | Yes | Yes |
| ShapeletSim | 50,00 | No | Yes | Yes |
| ShapesAll | 70,00 | No | Yes | Yes |
| SmallKitchenAppliances | 60,11 | No | Yes | Yes |
| SmoothSubspace | 96,00 | Yes | Yes | Yes |
| SonyAIBORobotSurface1 | 99,36 | No | Yes | Yes |
| SonyAIBORobotSurface2 | 97,96 | No | Yes | Yes |
| Strawberry | 77,64 | Yes | Yes | Yes |
| SwedishLeaf | 82,27 | Yes | Yes | Yes |
| Symbols | 95,69 | No | Yes | Yes |
| SyntheticControl | 89,33 | Yes | Yes | Yes |
| ToeSegmentation2 | 80,95 | No | Yes | Yes |
| Trace | 68,00 | Yes | Yes | Yes |
| TwoLeadECG | 98,97 | No | Yes | Yes |
| TwoPatterns | 99,92 | No | Yes | Yes |
| UMD | 93,33 | Yes | Yes | Yes |
| UWaveGestureLibraryAll | 95,54 | No | Yes | Yes |
| UWaveGestureLibraryX | 81,61 | No | Yes | Yes |
| UWaveGestureLibraryY | 72,59 | No | Yes | Yes |
| UWaveGestureLibraryZ | 75,98 | No | Yes | Yes |
| Wafer | 99,83 | Yes | Yes | Yes |
| Wine | 60,71 | No | Yes | Yes |
| WordSynonyms | 67,84 | No | Yes | Yes |
| Worms | 50,77 | No | Yes | Yes |
| WormsTwoClass | 50,77 | Yes | Yes | Yes |
| Yoga | 91,39 | No | Yes | Yes |

Notes (English)

Data Provenance and Licensing

The raw time series used in this benchmark originate from the Time Series Classification Repository, a well-established and widely used academic resource for time series classification research.

All time series datasets were obtained using the standardized data access interface provided by the aeon Python toolkit:

Middlehurst, M., Ismail-Fawaz, A., Guillaume, A., Holder, C., Guijo-Rubio, D., Bulatova, G., Tsaprounis, L., Mentel, L., Walter, M., Schäfer, P., & Bagnall, A. (2024). aeon: a Python Toolkit for Learning from Time Series. Journal of Machine Learning Research, 25(289), 1–10. http://jmlr.org/papers/v25/23-1444.html

For academic applications and non-commercial usage, the files are made available under the terms of the Creative Commons Attribution 4.0 International (CC BY 4.0) license.

Files

anchor.zip

Files (3.1 GB)

Name	Size	Download all
anchor.zip md5:feca5e739c85b9c5a582c1fc83aad65d	1.3 MB	Preview Download
lime.zip md5:97c4bf4952e29af5045cb57e4292fff9	402.8 MB	Preview Download
models.zip md5:1a79cbd4cba223a1e06e25044dd60883	1.4 GB	Preview Download
shap.zip md5:6b5560b1c17f88d72efb5e4fe6c03b7f	407.2 MB	Preview Download
train_test.zip md5:20573856720811582ac78becc7845a70	977.4 MB	Preview Download

	All versions	This version
Views	112	93
Downloads	102	101
Data volume	64.2 GB	63.4 GB

Collection of Time-Series Classification Datasets with Pretrained Deep Models and SHAP, LIME & Anchor Explanations

Authors/Creators

Description

A benchmark suite for reproducible time-series XAI research

Impact of the dataset

Repository content

Technical info (English)

Model Training

Explanations

Accompanying GitHub repository

List of all Time series

Notes (English)

Data Provenance and Licensing

Files

anchor.zip

Files (3.1 GB)