Communication-Efficient Distributed Learning via Sparse and Adaptive Stochastic Gradient

Deng, Xiaoge; Li, Dongsheng; Sun, Tao; Lu, Xicheng

doi:10.1109/TBDATA.2024.3407510

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:2112.04088 (cs)

[Submitted on 8 Dec 2021 (v1), last revised 9 Jun 2024 (this version, v5)]

Title:Communication-Efficient Distributed Learning via Sparse and Adaptive Stochastic Gradient

Authors:Xiaoge Deng, Dongsheng Li, Tao Sun, Xicheng Lu

View PDF HTML (experimental)

Abstract:Gradient-based optimization methods implemented on distributed computing architectures are increasingly used to tackle large-scale machine learning applications. A key bottleneck in such distributed systems is the high communication overhead for exchanging information, such as stochastic gradients, between workers. The inherent causes of this bottleneck are the frequent communication rounds and the full model gradient transmission in every round. In this study, we present SASG, a communication-efficient distributed algorithm that enjoys the advantages of sparse communication and adaptive aggregated stochastic gradients. By dynamically determining the workers who need to communicate through an adaptive aggregation rule and sparsifying the transmitted information, the SASG algorithm reduces both the overhead of communication rounds and the number of communication bits in the distributed system. For the theoretical analysis, we introduce an important auxiliary variable and define a new Lyapunov function to prove that the communication-efficient algorithm is convergent. The convergence result is identical to the sublinear rate of stochastic gradient descent, and our result also reveals that SASG scales well with the number of distributed workers. Finally, experiments on training deep neural networks demonstrate that the proposed algorithm can significantly reduce communication overhead compared to previous methods.

Comments:	Accepted by IEEE Transactions on Big Data, 2024
Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC)
Cite as:	arXiv:2112.04088 [cs.DC]
	(or arXiv:2112.04088v5 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.2112.04088
Related DOI:	https://doi.org/10.1109/TBDATA.2024.3407510

Submission history

From: Xiaoge Deng [view email]
[v1] Wed, 8 Dec 2021 02:55:28 UTC (7,875 KB)
[v2] Sun, 17 Apr 2022 03:47:42 UTC (8,860 KB)
[v3] Mon, 29 Aug 2022 14:38:01 UTC (9,492 KB)
[v4] Wed, 29 May 2024 09:18:28 UTC (1,586 KB)
[v5] Sun, 9 Jun 2024 11:47:03 UTC (1,586 KB)

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Communication-Efficient Distributed Learning via Sparse and Adaptive Stochastic Gradient

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Communication-Efficient Distributed Learning via Sparse and Adaptive Stochastic Gradient

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators