There is a newer version of the record available.

Published December 15, 2022 | Version v1.0.0
Dataset Restricted

NeMig - A Bilingual News Collection and Knowledge Graph about Migration

  • 1. University of Mannheim
  • 2. Télécom Paris, Institute Polytechique de Paris
  • 3. Karlsruhe Institute of Technology

Description

NeMig are two English and German knowledge graphs constructed from news articles on the topic of migration, collected from online media outlets from Germany and the US, respectively. NeMIg contains rich textual and metadata information, sub-topics and sentiment annotations, as well as named entities extracted from the articles' content and metadata and linked to Wikidata. The graphs are expanded with up to two-hop neighbors from Wikidata of the initial set of linked entities.

NeMig comes in four flavors, for both the German, and the English corpora:

  • Base NeMig: contains literals and entities from the corresponding annotated news corpus;
  • Entities NeMig: derived from the Base NeMIg by removing all literal nodes, it contains only resource nodes;
  • Enriched Entities NeMig: derived from the Entities NeMig by enriching it with up to two-hop neighbors from Wikidata, it contains only resource nodes and Wikidata triples;
  • Complete NeMig: the combination of the Base and Enriched Entities NeMig, it contains both literals and resources.

Information about uploaded files:

(all files are b-zipped and in the N-Triples format.)

File Description
nemig_${language}_ ${graph_type}-metadata.nt.bz2 Metadata about the dataset, described using void vocabulary.
nemig_${language}_ ${graph_type}-instances_types.nt.bz2 Class definitions of news and event instances.
nemig_${language}_ ${graph_type}-instances_labels.nt.bz2 Labels of instances.
nemig_${language}_ ${graph_type}-instances_related.nt.bz2 Relations between news instances based on one another.
nemig_${language}_ ${graph_type}-instances_metadata_literals.nt.bz2 Relations between news instances and metadata literals (e.g. URL, publishing date, modification date, sentiment label, political orientation of news outlets).
nemig_${language}_ ${graph_type}-instances_content_mapping.nt.bz2 Mapping of news instances to content instances (e.g. title, abstract, body).
nemig_${language}_ ${graph_type}-instances_topic_mapping.nt.bz2 Mapping of news instances to sub-topic instances.
nemig_${language}_ ${graph_type}-instances_content_literals.nt.bz2 Relations between content instances and corresponding literals (e.g. text of title, abstract, body).
nemig_${language}_ ${graph_type}-instances_metadata_resources.nt.bz2 Relations between news or sub-topic instances and entities extracted from metadata (i.e. publishers, authors, keywords).
nemig_${language}_ ${graph_type}-instances_event_mapping.nt.bz2 Mapping of news instances to event instances.
nemig_${language}_ ${graph_type}-event_resources.nt.bz2 Relations between event instances and entities extracted from the text of the news (i.e. actors, places, mentions).
nemig_${language}_ ${graph_type}-resources_provenance.nt.bz2 Provenance information about the entities extracted from the text of the news (e.g. title, abstract, body).
nemig_${language}_ ${graph_type}-wiki_resources.nt.bz2 Relations between Wikidata entities from news and their k-hop entity neighbors from Wikidata.

 

Files

Restricted

The record is publicly accessible, but files are restricted to users with access.

Request access

If you would like to request access to these files, please fill out the form below.

You need to satisfy these conditions in order for this request to be accepted:

The data is available for research purposes and licensed under the  Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Please submit a short statement on the research work you want to conduct in order to gain access to the files.

You are currently not logged in. Do you have an account? Log in here