INSTANCE is a dataset of seismic waveforms data and associated metadata suited for analysis based on machine learning. It includes:
- 54,008 earthquakes for a total of 1,159,249 3-channel waveforms;
- 132,330 3-channel noise waveforms;
- 114 precomputed observable quantities providing information on station, trace, source, path and quality;
- 19 networks;
- 620 seismic stations.
Earthquakes a) and stations b) in INSTANCE. Symbols size are proportional to earthquake magnitude and number of arrival phases recorded by stations, respectively
Events with Magnitude in the range [2-4]
Events selected from HN channel
Noise selected form HH channel

INSTANCE The Italian Seismic Dataset For Machine Learning, Alberto Michelini, Spina Cianetti, Sonja Gaviano, Carlo Giunchi, Dario Jozinović & Valentino Lauciani, Seismic Waveforms And Associated Metadata published 2021 in Istituto Nazionale di Geofisica e Vulcanologia (INGV) https://doi.org/10.13127/instance
To get the full INSTANCE dataset you have to download:
-
Events metadata (csv, 238 MB) - doi:10.13127/instance/eventsmetadata.1
-
Events data in counts as single hdf5 file (39 GB) or 10 GB parts (part-a, part-b, part-c, part-d) - doi:10.13127/instance/events.1
-
Events data in ground motion units as single hdf5 file (151 GB) or 20 GB parts (part-a, part-b, part-c, part-d, part-e, part-f, part-g, part-h). Ground motion units are m/s for HH and EH channels and m/s2 for HN channel. doi:10.13127/instance/groundmotion.1
-
Noise metadata (csv, 6.7 MB) - doi:10.13127/instance/noisemetadata.1
-
Noise data in counts (hdf5, 3.9 GB) - doi:10.13127/instance/noise.1
-
Stations inventory (StationXML, 15 MB)
All the above downloads provide bzip2 compressed files. The multipart files can be reassembled and then unzipped (e.g., for the event data file)
cat Instance_events_counts.hdf5.bz2.part-* > Instance_events_counts.hdf5.bz2
bzip2 -d Instance_events_counts.hdf5.bz2
A sample dataset of about 1.7 GB is provided to run the notebooks. This contains 10,000 events and 1000 noise waveforms together with the associated metadata. Potentially interested users can evaluate INSTANCE data and metadata without downloading the whole dataset.
- Sample dataset (1.7 GB)
The following notebooks provide examples about reading waveforms and metadata of INSTANCE. They refers to the sample dataset; to use them with the full dataset filenames must be changed accordingly.
Plots.ipynb to explore significant parameters distribution in INSTANCE using metadata
Waveforms.ipynb to select and plot 3 channel waveforms
Station_Hypocenter_MomentTensor.ipynb maps about earthquakes included in INSTANCE
To run the notebooks please make sure the following packages are properly installed in your environment:
- obspy
- jupyter
- basemap
- pandas
- seaborn
- h5py
- hdf5
or just create a dedicated environment for INSTANCE
conda create -n instance python=3.7 obspy jupyter basemap pandas seaborn h5py hdf5
conda activate instance
git clone https://github.com/cjunkk/instance
cd instance
curl http://repo.pi.ingv.it/instance/Instance_sample_dataset.tar.bz2 | tar xj
Creative commons license Attribution 4.0 International (CC BY 4.0)
