NeMig - A Bilingual News Collection and Knowledge Graph about Migration
Authors/Creators
- 1. University of Mannheim
- 2. Télécom Paris, Institute Polytechique de Paris
- 3. Karlsruhe Institute of Technology
Description
NeMig are two English and German knowledge graphs constructed from news articles on the topic of migration, collected from online media outlets from Germany and the US, respectively. NeMIg contains rich textual and metadata information, sub-topics and sentiment annotations, as well as named entities extracted from the articles' content and metadata and linked to Wikidata. The graphs are expanded with up to two-hop neighbors from Wikidata of the initial set of linked entities.
NeMig comes in four flavors, for both the German, and the English corpora:
- Base NeMig: contains literals and entities from the corresponding annotated news corpus;
- Entities NeMig: derived from the Base NeMIg by removing all literal nodes, it contains only resource nodes;
- Enriched Entities NeMig: derived from the Entities NeMig by enriching it with up to two-hop neighbors from Wikidata, it contains only resource nodes and Wikidata triples;
- Complete NeMig: the combination of the Base and Enriched Entities NeMig, it contains both literals and resources.
Information about uploaded files:
(all files are b-zipped and in the N-Triples format.)
| File | Description |
|---|---|
| nemig_${language}_ ${graph_type}-metadata.nt.bz2 | Metadata about the dataset, described using void vocabulary. |
| nemig_${language}_ ${graph_type}-instances_types.nt.bz2 | Class definitions of news and event instances. |
| nemig_${language}_ ${graph_type}-instances_labels.nt.bz2 | Labels of instances. |
| nemig_${language}_ ${graph_type}-instances_related.nt.bz2 | Relations between news instances based on one another. |
| nemig_${language}_ ${graph_type}-instances_metadata_literals.nt.bz2 | Relations between news instances and metadata literals (e.g. URL, publishing date, modification date, sentiment label, political orientation of news outlets). |
| nemig_${language}_ ${graph_type}-instances_content_mapping.nt.bz2 | Mapping of news instances to content instances (e.g. title, abstract, body). |
| nemig_${language}_ ${graph_type}-instances_topic_mapping.nt.bz2 | Mapping of news instances to sub-topic instances. |
| nemig_${language}_ ${graph_type}-instances_content_literals.nt.bz2 | Relations between content instances and corresponding literals (e.g. text of title, abstract, body). |
| nemig_${language}_ ${graph_type}-instances_metadata_resources.nt.bz2 | Relations between news or sub-topic instances and entities extracted from metadata (i.e. publishers, authors, keywords). |
| nemig_${language}_ ${graph_type}-instances_event_mapping.nt.bz2 | Mapping of news instances to event instances. |
| nemig_${language}_ ${graph_type}-event_resources.nt.bz2 | Relations between event instances and entities extracted from the text of the news (i.e. actors, places, mentions). |
| nemig_${language}_ ${graph_type}-resources_provenance.nt.bz2 | Provenance information about the entities extracted from the text of the news (e.g. title, abstract, body). |
| nemig_${language}_ ${graph_type}-wiki_resources.nt.bz2 | Relations between Wikidata entities from news and their k-hop entity neighbors from Wikidata. |