Awesome-Data-Centric-GraphML

A collection of papers and resources about Data-centric Graph Machine Learning (DC-GML).

We undertake a comprehensive review and provide a promising outlook for data-centric graph machine learning (DC-GML), and propose a systematic framework for DC-GML that encompasses all stages of the graph data lifecycle, including graph data collection, exploration, improvement, exploitation, and maintenance. More details can be found in our review & outlook work: https://arxiv.org/abs/2309.10979

@article{zheng2023towards,
  title={Towards Data-centric Graph Machine Learning: Review and Outlook},
  author={Zheng, Xin and Liu, Yixin and Bao, Zhifeng and Fang, Meng and Hu, Xia and Liew, Alan Wee-Chung and Pan, Shirui},
  journal={arXiv preprint arXiv:2309.10979},
  year={2023}}

Updates

2023.11
- Invited to give a tutorial in Australia Database Conference (ADC), 2023. The tutorial slides can be found in the folder.

[KDD'2020-Pro-GNN] Graph structure learning for robust graph neural networks. [paper]
[ICML'2019-LDS] Learning discrete structures for graph neural networks. [paper]
[WWW'2021-GEN] Graph structure estimation neural networks. [paper]
[CVPR'2019-GLCN] Semi-supervised learning with graph learning convolutional networks. [paper]
[NIPS'2020-IDGL] Iterative deep graph learning for graph neural networks: Better and robust node embeddings. [paper]

Graph Sparsification

[AIS'2016] Graph sparsification approaches for laplacian smoothing. [paper]
[SIGMOD'2011] Local graph sparsification for scalable clustering. [paper]
[SICOMP'2011] Spectral sparsification of graphs.[paper]
[NIPS'2019] On differentially private graph sparsification and applications. [paper]
[ICDM'2022-GraphSparsify] A generic graph sparsification framework using deep reinforcement learning. [paper]

Graph Diffusion

[ICLR'2019-PPNP/APPNP] Predict then propagate: graph neural networks meet personalized pagerank. [paper]
[NIPS'2019-GDC] Diffusion improves graph learning. [paper]
[ICLR'2021] Adaptive universal generalized pagerank graph neural network. [paper]
[NIPS'2021-ADC] Adaptive diffusion in graph neural networks. [paper]

Graph Feature Enhancement

Graph Feature Completion

[NN'2020-GINN] Missing data imputation with adversarially-trained graph convolutional networks. [paper]
[FGCS'2021-GCN_MF] Graph convolutional networks for graphs containing missing features. [paper]
[TPAMI'2020-SAT] Learning on attribute-missing graphs. [paper]
[WWW'2021-HGNN-AC] Heterogeneous graph neural network via attribute completion. [paper]
[IEEETransCybern'2022-Amer] Amer: A new attribute-missing network embedding approach. [paper]
[arxiv'2021-SAGA] Siamese attribute-missing graph auto-encoder. [paper]

Graph Feature Denoising

[SPM'2013] The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains. [paper]
[GlobalSIP'2014] Signal denoising on graphs via graph filtering. [paper]
[IET-SP'2018] Graph polynomial filter for signal denoising. [paper]
[AIS'2015] Trend filtering on graphs. [paper]
[ICASSP'2020] Graph auto-encoder for graph signal denoising. [paper]
[TSP'2021] Graph unrolling networks: Interpretable neural networks for graph signal denoising. [paper]
[TSP'2022] Untrained graph neural networks for denoising. [paper]
[WWW'2023-MAGNET] Robust graph representation learning for local corruption recovery. [paper]

Graph Label Enhancement

Graph Pseudo-labeling

[AAAI'2018] Deeper insights into graph convolutional networks for semi-supervised learning. [paper]
[AAAI'2020] Multi-stage self-supervised learning for graph convolutional networks on graphs with few labeled nodes. [paper]
[CIKM'2021-IFC-GCN] Rectifying pseudo labels: Iterative feature clustering for graph representation learning. [paper]
[arXiv'2019-DSGCN] Dynamic self-training framework for graph convolutional networks. [paper]
[WSDM'2022-RS-GNN] Towards robust graph neural networks for noisy graphs with sparse labels. [paper]
[DMKD'2023-InfoGNN] Informative pseudo-labeling for graph neural networks with few labels. [paper]

Graph Label Denoising

[WSDM'2023-CLNode] CLNode: Curriculum learning for node classification. [paper]
[arXiv'2019-D-GNN] Learning graph neural networks with noisy labels. [paper]
[CIKM'2021-IFC-GCN] Rectifying pseudo labels: Iterative feature clustering for graph representation learning. [paper]
[KDD'2021-NRGNN] Nrgnn: Learning a label noise resistant graph neural network on sparsely and noisily labeled graphs. [paper]
[WSDM'2023-RTGNN] Robust training of graph neural networks via noise governance. [paper]

Graph Class-imbalanced Sampling

[WSDM'2021-GraphSMOTE] Graphsmote: Imbalanced node classification on graphs with graph neural networks. [paper]
[KDD'2021-ImGAGN] Imgagn: Imbalanced network embedding via generative adversarial graph networks. [paper]
[WWW'2021-PC-GNN] Pick and choose: a GNN-based imbalanced learning approach for fraud detection. [paper]
[WWW'2021-GraphMixup] Mixup for node and graph classification. [paper]
[ICLR'2021-GraphENS] GraphENS: Neighbor-aware ego network synthesis for class-imbalanced node classification. [paper]
[arXiv'2023-GraphSR] GraphSR: A Data Augmentation Algorithm for Imbalanced Node Classification. [paper]
[NIPS'2021-ReNode] Topology-imbalance learning for semi-supervised node classification. [paper]
[arXiv'2022-TopoImb] TopoImb: Toward topology-level imbalance in learning from graphs. [paper]
[IJCAI'2013-igBoost] Graph classification with imbalanced class distributions and noise. [paper]
[CIKM'2022-G2GNN] Imbalanced graph classification via graph-of-graph neural networks. [paper]

Graph Size Enhancement

Graph Size Reduction

[ICML'2009-Herding] Herding dynamical weights to learn. [paper]
[CVPR'2017-ICARL] ICARL: Incremental classifier and representation learning. [paper]
[ICLR'2018-K-center] Active learning for convolutional neural networks: A core-set approach. [paper]
[ICAIS'2020-Coarsening] Graph coarsening with preserved spectral properties. [paper]
[arXiv'2021] Graph domain adaptation: A generative view. [paper]
[ICLR'2021-GCond] Graph condensation for graph neural networks. [paper]
[KDD'2022-DosCond] Condensing graphs via one-step gradient matching. [paper]
[NeurIPS-Workshop'2022] Faster hyperparameter search on graphs via calibrated dataset condensation. [paper]
[arXiv'2023-SFGC] Structure-free graph condensation: From large-scale graphs to condensed graph-free data. [paper]

Graph Data Augmentation

[ACM SIGKDD Explorations Newsletter'2022-Survey] Data augmentation for deep graph learning: A survey. [paper]
[arXiv'2202-Survey] Graph data augmentation for graph machine learning: A survey. [paper]
[ICLR'2020-DropEdge] DropEdge: Towards deep graph convolutional networks on node classification. [paper]
[NeurIPS'2020-GRAND] Graph random neural networks for semi-supervised learning on graphs. [paper]
[AAAI'2022-NASA] Regularizing graph neural networks via consistency-diversity graph augmentations. [paper]
[KDD'2020-NodeAug] NodeAug: Semi-supervised node classification with data augmentation. [paper]
[AAAI'2021-GAUG] Data augmentation for graph neural networks. [paper]
[AAAI'2021-GraphMix] Graphmix: Improved training of gnns for semi-supervised learning. [paper]
[WWW'2021-GraphMixup] Mixup for node and graph classification. [paper]
[WSDM'2021-GraphSMOTE] Graphsmote: Imbalanced node classification on graphs with graph neural networks. [paper]
[CVPR'2022-FLAG] Robust optimization as data augmentation for large-scale graphs. [paper]
[ICML'2022-G-Mixup] G-mixup: Graph data augmentation for graph classification. [paper]
[ICML'2022-LAGNN] Local augmentation for graph neural networks. [paper]

How To Learn From Graph Data With Limited-availability and Low-quality?

The answer to this question corresponds to 'Graph Data Exploitation' stage in DC-GML framework, incorporating four strategies to learn from graph data with low-quality and limited-availability, i.e., Graph Self-supervised Learning, Graph Semi-supervised Learning, Graph Active Learning, and Graph Transfer Learning.

Graph Self-supervised Learning

[TKDE'2022-Survey] Graph self-supervised learning: A survey. [paper]
[arXiv'2016-GAE] Variational graph auto-encoders. [paper]
[CIKM'2017-MGAE] MGAE: Marginalized graph autoencoder for graph clustering. [paper]
[IJCAI'2018-ARGA] Adversarially regularized graph autoencoder for graph embedding. [paper]
[ICLR'2019-DGI] Deep graph infomax. [paper]
[ICML'2020-MVGRL] Contrastive multi-view representation learning on graphs. [paper]
[NeurIPS'2020-GraphCL] Graph contrastive learning with augmentations. [paper]
[arXiv'2020-PairwiseDistance/NodeProperty] Self-supervised learning on graphs: Deep insights and new direction. [paper]
[NeurIPS'2020-GROVER] Self-supervised graph transformer on large-scale molecular data. [paper]
[WWW'2020-GMI] Graph representation learning via graphical mutual information maximization. [paper]
[ICML'2020] When does self-supervision help graph convolutional networks? [paper]
[WWW'2021-GCA] Graph contrastive learning with adaptive augmentation. [paper]
[ICML'2021-JOAO] Graph contrastive learning automated. [paper]
[NeurIPS'2021-AD-GCL] Adversarial graph augmentation to improve graph contrastive learning. [paper]
[KDD'2022-GraphMAE] GraphMAE: Self-supervised masked graph autoencoders. [paper]
[Information Sciences'2022-S2GRL] A new self-supervised task on graphs: Geodesic distance prediction. [paper]
[ICLR'2022-AutoSSL] Automated self-supervised learning for graphs. [paper]

Graph Semi-supervised Learning

[ICML'2003] Semi-supervised learning using gaussian fields and harmonic functions. [paper]
[NeurIPS'2003] Learning with local and global consistency. [paper]
[ICML'2005] Learning from labeled and unlabeled data on a directed graph. [paper]
[AAAI'2018] Deeper insights into graph convolutional networks for semi-supervised learning. [paper]
[KDD'2020-NodeAug] NodeAug: Semi-supervised node classification with data augmentation. [paper]
[NeurIPS'2020-GRAND] Graph random neural networks for semi-supervised learning on graphs. [paper]
[AAAI'2020-M3S] Multi-stage self-supervised learning for graph convolutional networks on graphs with few labeled nodes. [paper]
[WSDM'2021-SimP-GCN] Node similarity preserving graph convolutional networks. [paper]
[ACM-TIS'2021-GCN-LPA] Combining graph convolutional neural networks and label propagation. [paper]
[AAAI'2021-CG3] Contrastive and generative graph convolutional networks for graph-based semi-supervised learning. [paper]
[NeurIPS'2021-GCPN] Contrastive graph poisson networks: Semi-supervised learning with extremely limited labels. [paper]
[AAAI'2022-Meta-PN] Meta propagation networks for graph few-shot semi-supervised learning. [paper]
[World Wide Web'2022-CycProp] Cyclic label propagation for graph semi-supervised learning. [paper]

Graph Active Learning

[arXiv'2017-AGE] Active learning for graph embedding. [paper]
[IJCAI'2018-ANRMAB] Active discriminative network representation learning. [paper]
[IJCAI'2019-ActiveHNE] ActiveHNE: active heterogeneous network embedding. [paper]
[arXiv'2019-FeatProp] Active learning for graph neural networks via node feature propagation. [paper]
[WWW'2020-ATNE] Active domain transfer on network embedding. [paper]
[KDD'2020-ASGN] ASGN: An active semi-supervised graph neural network for molecular property prediction. [paper]
[NeurIPS'2020-GPA] Graph policy network for transferable active learning on graphs. [paper]
[ACML'2020-MetAL] Metal: Active semi-supervised learning on graphs via meta-learning. [paper]
[TNNLS'2020-SEAL] Seal: Semisupervised adversarial active learning on attributed graphs. [paper]
[VLDB Endowment'2021-GRAIN] GRAIN: improving data efficiency of graph neural networks via diversified in fluence maximization. [paper]
[NeurIPS'2021-RIM] RIM: Reliable influence-based active learning on graphs. [paper]
[WWW'2021-Attent] Attent: Active attributed network alignment. [paper]
[ICMD'2021-ALG] ALG: Fast and accurate active learning framework for graph convolutional networks. [paper]
[WWW'2022-ALLIE] ALLIE: Active learning on large-scale imbalanced graphs. [paper]
[AAAI'2022-BIGENE] Batch active learning with graph neural networks via multi-agent deep reinforcement learning. [paper]
[ICLR'2022-IGP] Information Gain Propagation: A new way to graph active learning with soft labels. [paper]
[KDD'2022-JuryGCN] JuryGCN: quantifying jackknife uncertainty on graph convolutional networks. [paper]

Graph Transfer Learning

[IJCAI'2019-DANE] DANE: domain adaptive network embedding. [paper]
[WWW'2020-UDA-GCN] Unsupervised domain adaptive graph convolutional networks. [paper]
[AAAI'2020-ACDNE] Adversarial deep network embedding for cross-network node classification. [paper]
[ICDM'2020-OpenWGL] Openwgl: Open-world graph learning. [paper]
[ICML'2020-PGL] Progressive graph learning for open-set domain adaptation. [paper]
[NeurIPS'2021-SRGNN] Shift-robust gnns: Overcoming the limitations of localized graph training data. [paper]
[arXiv'2021-SOGA] Source free unsupervised graph domain adaptation. [paper]
[arXiv'2021] Graph domain adaptation: A generative view. [paper]
[NeurIPS-Workshop'2022-SRNC] Shift-robust node classification via graph clustering co-training. [paper]

How To Build Graph MLOps System: The Graph Data-centric View.

The answer to this question corresponds to three stages of 'Graph Data Collection, Graph Data Exploration, and Graph Data Maintenance' in DC-GML framework. Along with Graph Data Improvement and Graph Data Exploitation, we build a graph MLOps from the graph data-centric view.

Graph Data Collection

Amazon Mechanical Turk: https://www.mturk.com/
[SIGIR-Workshop'2011] Semi-supervised consensus labeling for crowdsourcing. [paper]
[Cloud Computing'2021] Knowledge graphs meet crowdsourcing: a brief survey. [paper]
[Journal of Classification'1997] Estimation and prediction for stochastic blockmodels for graphs with latent block structure. [paper]
Probabilistic graphical models: principles and techniques. [book]
[NeurIPS'2019] Gnnexplainer: Generating explanations for graph neural networks. [paper]
[KDD'2014] Focused clustering and outlier detection in large attributed graphs. [paper]
[Journal of Machine Learning Research'2023] Graph clustering with graph neural networks. [paper]
[WWW'2021] Pathfinder discovery networks for neural message passing. [paper]
[arXiv'2020] Benchmarking graph neural networks. [paper]
[arXiv'2022] Synthetic graph generation to benchmark graph learning. [paper]
[KDD'2022] Graphworld: Fake graphs bring real insights for gnns. [paper]

Graph Data Exploration

NetworkX: https://networkx.org/
igraph: https://igraph.org/
Neo4j: https://neo4j.com/
[ECML-PKDD'2021] Graphsvx: Shapley value explanations for graph neural networks. [paper]

Graph Data Maintenance

[arXiv'2022-TrustworthyGNN-Survey] Trustworthy graph neural networks: Aspects, methods and trends. [paper]
[IEEE Network'2010] Privacy and security for online social networks: challenges and opportunities. [paper]
[SIGMOD'2008] Towards identity anonymization on graphs. [paper]
[Multimedia Tools and Applications'2018] Privacy preservation based on clustering perturbation algorithm for social network. [paper]
[EDBT/ICDT Workshops'2015] Privacy-Integrated Graph Clustering Through Differential Privacy. [paper]
[Information Sciences'2020] PGAS: Privacy-preserving graph encryption for accurate constrained shortest distance queries. [paper]
[KDD'2022] Federatedscope-gnn: Towards a unified, comprehensive and efficient package for federated graph learning. [paper]
[AAAI'2023] Federated learning on Non-IID graphs via structural knowledge sharing. [paper]
[IEEE Communications Magazine'1994] Access control: principle and practice. [paper]
[Global Summit on Computer and Information Technology'2014] Implementation of elliptic curve digital signature algorithm (ECDSA). [paper]
[Computers and Security'2021] Threat detection and investigation with system-level provenance graphs: a survey. [paper]

Graph MLOps

Kubeflow: https://github.com/kubeflow/kubeflow
Amazon SageMaker: https://aws.amazon.com/sagemaker/
Amazon Neptune: https://neptune.ai/product
GraphStorm: https://github.com/awslabs/graphstorm/wiki
Real-time Fraud Detection with Graph Neural Network on DGL: https://github.com/awslabs/realtime-fraud-detection-with-gnn-on-dgl

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
.idea		.idea
ADC2023		ADC2023
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome-Data-Centric-GraphML

Updates

Table of Contents

How To Enhance Graph Data Availability and Quality?

Graph Structure Enhancement

Graph Structure Learning

Graph Sparsification

Graph Diffusion

Graph Feature Enhancement

Graph Feature Completion

Graph Feature Denoising

Graph Label Enhancement

Graph Pseudo-labeling

Graph Label Denoising

Graph Class-imbalanced Sampling

Graph Size Enhancement

Graph Size Reduction

Graph Data Augmentation

How To Learn From Graph Data With Limited-availability and Low-quality?

Graph Self-supervised Learning

Graph Semi-supervised Learning

Graph Active Learning

Graph Transfer Learning

How To Build Graph MLOps System: The Graph Data-centric View.

Graph Data Collection

Graph Data Exploration

Graph Data Maintenance

Graph MLOps

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Awesome-Data-Centric-GraphML

Updates

Table of Contents

How To Enhance Graph Data Availability and Quality?

Graph Structure Enhancement

Graph Structure Learning

Graph Sparsification

Graph Diffusion

Graph Feature Enhancement

Graph Feature Completion

Graph Feature Denoising

Graph Label Enhancement

Graph Pseudo-labeling

Graph Label Denoising

Graph Class-imbalanced Sampling

Graph Size Enhancement

Graph Size Reduction

Graph Data Augmentation

How To Learn From Graph Data With Limited-availability and Low-quality?

Graph Self-supervised Learning

Graph Semi-supervised Learning

Graph Active Learning

Graph Transfer Learning

How To Build Graph MLOps System: The Graph Data-centric View.

Graph Data Collection

Graph Data Exploration

Graph Data Maintenance

Graph MLOps

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages