A Generative AI for Heterogeneous
Network-on-Chip Design Space Pruning
Maxime Mirka, Maxime France-Pillois, Gilles Sassatelli, Abdoulaye Gamatié
LIRMM, Université de Montpellier & CNRS
Montpellier, France
[Link]@[Link]
Abstract—Often suffering from under-optimization, Networks- for multicore SoC. Although heterogeneous NoCs ensure the
on-Chip (NoCs) heavily impact the efficiency of domain-specific best trade-off between performance and power (see Section II),
Systems-on-Chip. To cope with this issue, heterogeneous NoCs its architecture defining is challenging.
are promising alternatives. Nevertheless, the design of optimized
NoCs satisfying multiple performance objectives is extremely In this paper, we propose an AI-based method to prune
challenging and requires significant expertise. Prior works failed the design space of heterogeneous NoCs. Benefiting from the
to combine many objectives or required an extended design recent progress in Generative Adversarial Networks (GANs),
space exploration time. In this paper, we propose an approach we devise a tool able to generate a reduced set of optimized
based on generative artificial intelligence to help pruning complex NoC configurations. These generated NoCs are optimized ac-
design spaces for heterogeneous NoCs, according to configurable
performance objectives. This is made possible by the ability of cording to multiple objectives defined by the user to satisfy
Generative Adversarial Networks to learn and generate relevant the expectations. The addition of several Reward modules
design candidates for the target NoCs. The speed and flexibility to the ”classical” GAN architecture enables multi-objective
of our solution enable a fast generation of optimized NoCs that fit optimization and results in the generation of a subset of NoC
users’ expectations. Through some experiments, we show how to configurations close to the optimal Pareto frontier. Hence, for
obtain competitive NoC designs reducing the power consumption
with no communication performance or area penalty compared a given traffic pattern, our tool generates a subset of optimized
to a given conventional NoC design. designs. The size of this subset is specified by a user. The
Index Terms—Generative Adversarial Network, CAD, Network- generated heterogeneous NoC design can improve the power
on-Chip, DSSoC, Heterogeneous, Machine learning consumption while preserving the throughput, compared to a
similar size homogeneous NoC design. In our experiments, we
I. I NTRODUCTION observe up to 15% power saving.
System specialization offers a promising solution to design II. P ROBLEM DEFINITION
power, area and performance-efficient Systems-on-Chip (SoCs). NoC sizing is a tedious and complex process. Under-sizing
When designed to meet the final application requirements, het- an NoC directly deteriorates latency and throughput, while
erogenous SoCs usually produce competitive designs. However, over-sizing substantially increases power consumption. NoC
new challenges arise with the design of such tailored domain- power consumption account for a significant fraction of the
specific SoCs. The key issue is to find a design that optimally chip’s power budget (up to 28%) [2]. When studying realistic
satisfies the application requirements. While High-Level Syn- NoC traffic workloads, non-uniform patterns are observed [3].
thesis (HLS) tools allow non-expert users to generate hardware This results in greater pressure put on specific areas of the NoC.
designs for their applications, these tools do not provide an Consequently, heterogeneous NoCs offer a promising solution
optimal SoC architecture. A naive solution to identify the to handle these workload imbalances.
optimal architecture should be to evaluate each possible design However, the design space size of a heterogeneous NoC
and pick the best one. However, the set of candidate designs is prohibitively large, even when merely considering the het-
(design space) is often very broad, and the evaluation time erogeneity of routers, the number of NoC configurations be-
would be excessively long. Hence, an exhaustive exploration of ing N umber Routers T ypesN umber Routers . For a 64-router
the design space is usually not tractable because of prohibitive NoC with three classes of routers, we obtain 364 NoC design
exploration time. A methodology to reduce the design space is options. Although SoC experts may have use intuitions to
therefore desirable. smartly reduce the design space, this approach remains non-
SoC interconnect is a key component that boldly influences practical and does not ensure optimal performance. As a result,
performance and power consumption. An undersized inter- a fast systematic method is required to prune the design space
connect fabric may lead to a communication bottleneck with and generate optimal NoC designs. The present work addresses
a potentially drastic impact on global system performance, this issue by proposing a generative AI framework.
especially in Von Neumann-like architectures. On the other
side, an oversized interconnect uselessly consumes many power III. R ELATED WORK
and silicon area [1]. The parallelism and flexibility offered by The exploration of the best NoC designs is related to multi-
Networks-on-Chip (NoCs) have made it the de-facto standard objective optimization. In this field, previous works proposed
978-3-9819263-6-1/DATE22/2022
c EDAA 1141
active algorithms to approximate a set of Pareto-optimal designs to provide further guidance to the Generator’s learning. Hence,
from a larger design space [4], [5]. These algorithms combine this new architecture specializes the Generator’s learning to
the simulation of some design samples with heuristics to not only generate data within the ”real domain”, but within a
establish the set of probable Pareto-optimal designs. While chosen subset of the ”real domain.”
they provide a relatively accurate set, they need to simulate b) Graph Neural Networks: Due to the significant simi-
a non-negligible number of design samples (up to 15% of the larity between NoCs and graph models, we consider the NoC
design space for NoC in [5]). Hence, these strategies require a generation as a graph generation problem. Both are constituted
consequent delay to provide a new optimum NoC design. from a set of links (i.e. connections/edges) and a set of
Regarding proposals targeting exclusively the design of NoC nodes (i.e. routers/vertices). Recent advances in Graph Neural
design, they struggle to handle flexible multi-objective opti- Networks (GNN) [8] therefore open attractive opportunities for
mization. The recent work of Alhubail et al. aims to aid in NoC modeling and generation. In this work, we leverage graph
the design of on-chip interconnects for heterogeneous SoCs convolutional networks (GCNs) [12] to devise Neural Network
[6]. It combines two algorithms to handle two objectives, i.e. a architectures providing accurate graph predictions.
Genetic Algorithm to reduce NoC latency, and a Strength Pareto
Evolutionary Algorithm to target the power consumption. Al- B. Multi-Reward WGAN
though this contribution provides a tool to design heterogeneous
NoCs, the adopted backend prevents using multiple concurrent y^ s1 s2 sn
ܮ ᇲ ܮ ܮோభ ܮோమ ܮோ
objectives. In our work, we propose a methodology tackling
multiple optimization objectives at once. Discriminator R1 R2
... Rn Inference
mode
In [7], the GANNoC framework exploits Convolutional Neu-
ral Network (CNN) and GANs to generate irregular NoC x x^ ீܮ
topologies minimizing the number of inter-router connections.
Real Data Generator : Back-propagation
In this work, we leverage GANs to reduce the design space
z
exploration for heterogeneous NoC composed of a diversity of
Random Space
routers. Our framework differs from GANNoC in three major
points: 1) it proposes a novel GAN framework to handle multi- Fig. 1: Multi-Objectives RWGAN diagram, with the back-
objective optimization, 2) it deals with router heterogeneity, and propagation path in dashed arrows.
3) it exploits graph neural networks [8] to model NoCs more
accurately.
Our proposed Multi-Reward WGAN relies on two funda-
IV. D ESIGN S PACE PRUNING TOOL mental considerations which are as follows:
The proposed tool benefits from a framework made of a a) NoC generation: The Multi-Reward WGAN (M-
GAN and Reward modules. The GAN allows the automatic RWGAN) concept extends the RWGAN architecture by imple-
generation of valid NoC configurations, while the Rewards help menting several Reward modules, leading to a multi-objective
steering the generation process towards optimized solutions. Generator training (see Figure 1). In the proposed framework,
the architecture of the Discriminator and Rewards are based
A. GAN and GNN neural networks on GNN (i.e. GCN [12]). The Discriminator training is similar
a) GAN: A Generative Adversarial Network (GAN) is a to traditional WGAN. It takes both x and x̂, respectively real
neural network architecture proposed by Goodfellow et al. [9]. data and generated data, and output a score regarding the
A GAN consists of two neural networks: a Generator model ”realness” of the input, i.e. ŷ. It is then trained to minimize
and a Discriminator model. The Discriminator is a neural its prediction error i.e. minimize its loss function LD (x, x̂)
network trained to classify its inputs into fake or real data. implemented as a W-loss. The Generator outputs x̂ from a
It is alternatively fed by real data coming from a dataset and random noise input z. x̂ is processed by the Discriminator
fake data originating from the Generator module. Thus, its job and the Rewards (Ri , with i ∈ [1, n]). As illustrated in
consists in labeling real data as real and generated data as fake. Figure 1, the Generator parameters are trained according to
As for the Generator, its goal is to mislead the Discriminator the outputs of these modules, i.e. ŷ from the Discriminator,
by generating data as real as possible. Both neural networks are and si from the Rewards Ri . Following the WGAN training,
trained concurrently in an alternating fashion. Therefore, GANs the Generator learns to minimize the Wasserstein loss coming
are based on a game theory scenario in which the Generator from the Discriminator, while minimizing as well the losses
tries and learns to fool the Discriminator. It eventually leads from the Rewards. Those latter losses are obtained via a mean-
the Generator toward the production of original realistic data. squared error measure, w.r.t. to the corresponding score goal.
GANs convergence is a well-known issue. They are prone to Rewards are supervised models trained before the Generator
vanishing gradient and mode collapse failures. Thus, Arjovsky and Discriminator training, with labeled datasets created by
introduced the Wasserstein loss (W-loss) and built upon this using a NoC simulator. Rewards modules are only used in
concept a WGAN mitigating those training problems [10]. inferring mode within the M-RWGAN framework.
Lately, the concept of ”Reward WGAN” (RWGAN) was b) Multi-Objective Loss Function: The Generator’s train-
introduced [7], [11]. It adds a third network, called Reward, ing consists in minimizing its loss. From the RWGAN Gener-
1142 Design, Automation and Test in Europe Conference (DATE 2022)
ator loss function presented in a previous work [7], we derive estimate the NoC area solely based on the routers, the NoC
the M-RWGAN Generator loss principle as follows: topology has no impact. Thus, the use of GCN is irrelevant,
n
and we approximate the area with a neural network composed
LG (z) = (1 − λ)LD (x̂) + λ[ βi LRi (x̂)] (1) of two CNN layers of size {128, 32} with 3x3 filters, and
i=1 2 dense layers of size {128, 1}. Models are implemented in
, where LG , LD and LRi are respectively the loss function of Python, using Keras [16] and Spektral [17] API.
the Generator, Discriminator and ith Reward. λ represents the B. Training methodology
training
n ratio between the Discriminator and Reward feedback.
The proposed tool requires ”offline” training: (1) the super-
i=1 βi = β, where β is a weight coefficient to balance the
order discrepancies that may appear between Discriminator loss vised training of the Rewards, (2) the GAN training.
values and Rewards loss values. Note: β must be adjusted w.r.t Performance metrics of some NoC designs sampled out of
to the number of Rewards n to avoid losses deterioration. the design space are obtained in simulation, as described above.
As a consequence, our proposed architecture provides a In this work, we empirically found that the simulation of 10 000
solution to train a generative neural network upon multi- samples is sufficient to achieve well-trained Rewards. Indeed,
objectives. It further enables to tune the consideration by the the dataset is carefully spread over the design space. Hence,
generator of each goal with weight coefficients (i.e. βi ), leading despite accounting for an infinitesimal part of the design space
to finely customized objective functions. (364 ), these samples are enough representative of the NoC
design behaviors to train a Neural Network. While the creation
V. E VALUATION of a dataset and the training of the Rewards is time-consuming,
A. Experimental setup i.e. approx. 3 days in our case, this step must be done only once
for each considered traffic pattern.
a) Use case definition: In this paper, we limit the router The GAN is trained for specific configurations of the user
heterogeneity to three router classes {Big, Medium, Small}. objectives. This tool configuration is denoted Tx Py Az , where x,
There are three-stage pipeline router with one virtual channel. y, z respectively stand for the throughput, power and area Re-
Their input buffers size are respectively {12, 4, 2} flits. We wards ratio. From the user point of view, these ratios represent
consider a 2D-mesh NoC composed of 64 routers organized the optimization effort performed by the tool to respectively:
in a 8x8 grid. We use a classical XY routing algorithm. maximize the throughput, reduce the power consumption, and
We devise the proposed framework to optimize the generated decrease the NoC area. From a theoretical perspective, these
NoCs according to three objectives: increasing NoC throughput, ratios define the share assigned to each objective during the
reducing power consumption, and decreasing silicon area. Each Generator’s loss computation (i.e. ββi , see Eq. 1).
objective is assessed by a dedicated Reward. We evaluate NoC
performances for a hotspot traffic pattern. The ”hot” router was C. Proof of concept
arbitrarily set at position [6, 6]. 30% of the messages are sent
to the hotspot, other messages obey a uniform pattern.
b) NoC Simulator: We performed NoCs performances
with a fast high-level discrete-event simulator for communica-
tion networks named Omnet++ [13]. The HNOCS framework
[14] enhances Omnet++ with additional NoC specific-support.
(a) Hotspot traffic.
The power and area assessments are achieved with the Orion3
library [15], an accurate high-level estimations tool. The used Fig. 2: Impact of Rewards trade-offs on the generated NoCs.
technological parameters comprise a 45nm manufacturing, a
Vdd of 1.0V, and a 650MHz frequency. Figure 2 illustrates the learning capability of the proposed
c) M-RWGAN neural networks models: We distinguish framework. We train our GAN according to several objectives
the architecture of the Generator and both the Discriminator and trade-offs: T100 (i.e. T100 P0 A0 ), T80 P10 A10 and T50 P10 A40 .
the Rewards. The Generator is a basic multi-layer perceptron. It After the training, we configure our tool to generate 100 NoC
is composed of three dense layers (i.e. fully connected) of size designs at once. The per-router NoC traffic intensity is depicted
{768, 1536, 192}, with leakyReLU as activation function. Its on the left. The other plots show the average router’s size over
input is a random vector of size 100. Its output is reshaped to the 100 generated NoC designs. The ”origin” plot represents the
a 64x3 matrix, matching the target NoC size. Since the router’s produced designs before training. We first notice the tool tends
type is encoded as a one-hot vector, the three channels stand to increase the size of routers in correlation with the traffic
for the number of router classes. A GumbelSoftmax function pattern when throughput is set as a sole objective. Afterward,
allows enforcing this one-hot encoding [11]. we observe a reduction of the router’s size for routers in the
The Discriminator and the Rewards implement the same periphery of the ”hot” routers, which results of Power and
architecture. They are composed of four GCN layers of size Area optimization. This shows the M-RWGAN indeed properly
{128, 64, 64, 32}, with a dense output layer of size 1. identified these locations are good candidates for Small routers
Activation functions are LeakyReLU. An exception is made given the relatively low traffic flowing in these areas and
for the Reward devoted to the NoC area estimation. As we thereby the limited resulting performance degradation.
Design, Automation and Test in Europe Conference (DATE 2022) 1143
VI. C ONCLUSION
The design of Heterogeneous NoCs in the scope of multi-
objective optimization is particularly challenging.
In this paper, we propose a tool based on generative AI to
address this issue. The proposed tool prunes a design space
according to multiple objectives and generates a subset of near
performance-optimal candidates designs. The tool is driven
at its core by M-RWGAN, a specific GAN-based network
(a) Hotspot traffic. architecture which is a contribution of this paper. This M-
Fig. 3: Impact of different rewards trade-off on generated sets. RWGAN makes it possible to steer the generative process
Boxes comprise 95% of data. towards solutions having high fitness with respect to the cho-
sen optimization criteria and their respective significance. A
significant strength of this proposal compared to the literature,
Figure 3 illustrates the convergence of the generator training therefore, lies in its ability to perform fine-grain parametric
toward subsets of optimized NoCs. It displays the measured design space exploration under multiple optimization criteria.
NoC performances along the training i.e. for various epochs. The performance evaluation of the generated NoCs is shown
We perform this analysis for two different tool configurations. to be competitive compared to naive NoC configurations. Future
We notice that at the beginning of the training process works aim at evaluating this framework in the case of multi-
the NoCs performances are rather widespread. After a few application traffic patterns. The goal being to produce het-
training epochs, the generated NoCs converge toward values erogeneous NoC configurations satisfying multiple application
matching the user objectives. The T100 configuration, optimized requirements while optimizing non-functional parameters such
for high throughput, generates NoCs designs gathered around as area and power consumption.
the optimal throughput at the end of the training. The second
R EFERENCES
tool configuration, named T50 P10 A40 , achieves an interesting
trade-off between the three user objectives. The generated NoCs [1] B. K. Daya et al., “Scorpio: A 36-core research chip demonstrating
snoopy coherence on a scalable mesh noc with in-network ordering,”
offer a high throughput with a reduced area, while the power in ACM/IEEE 41st Int. Symp. on Comput. Architecture, 2014, pp. 25–36.
consumption remains at a medium level. [2] Y. Hoskote et al., “A 5-ghz mesh interconnect for a teraflops processor,”
IEEE Micro, vol. 27, no. 5, pp. 51–61, 2007.
D. Generated NoCs evaluation [3] P. V. Gratz and S. W. Keckler, “Realistic workload characterization and
analysis for networks-on-chip design,” in The 4th Workshop on Chip
Multiprocessor Memory Systems and Interconnects (CMP-MSI), 2010.
[4] G. Palermo, C. Silvano, and V. Zaccaria, “Respir: A response surface-
based pareto iterative refinement for application-specific design space
exploration,” IEEE Transactions on Computer-Aided Design of Integrated
Circuits and Systems, vol. 28, no. 12, pp. 1816–1829, 2009.
[5] M. Zuluaga, G. Sergent, A. Krause, and M. Püschel, “Active learning for
multi-objective optimization,” in Proc. of the 30th Int. Conf. on Mach.
Learn., vol. 28, no. 1. PMLR, Jun 2013, pp. 462–470.
[6] L. Alhubail et al., “Noc design methodologies for heterogeneous architec-
ture,” in 28th Euromicro International Conference on Parallel, Distributed
and Network-Based Process. (PDP), 2020, pp. 299–306.
[7] M. Mirka et al., “Gannoc: A framework for automatic generation of noc
topologies using generative adversarial networks,” in Proc. of the 2021
Rapid Simulation and Perf. Eval.: Methods and Tools Proc. New York,
Fig. 4: Examples of generated NoCs throughput and power NY, USA: Association for Computing Machinery, 2021, p. 51–58.
performance for hotspot traffic pattern. [8] Z. Wu et al., “A comprehensive survey on graph neural networks,” IEEE
Transactions on Neural Networks and Learning Systems, vol. 32, no. 1,
p. 4–24, Jan 2021.
Figure 4 illustrates the ability of our method to obtain com- [9] I. J. Goodfellow et al., “Generative adversarial networks,” in Advances
petitive NoC configurations according to the user objectives. in Neural Information Processing Systems 27 (NIPS), 2014.
[10] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein gan,” 2017, arXiv
We compare Conf. P and Conf. T, two configurations produced preprint arXiv:1701.07875.
by a Generator, respectively trained with the Rewards trade- [11] N. D. Cao and T. Kipf, “Molgan: An implicit generative model for small
offs T10 P90 and T50 P10 A40 . Considering the hotspot traffic molecular graphs,” CoRR, vol. abs/1805.11973, 2018.
[12] T. N. Kipf and M. Welling, “Semi-supervised classification with graph
pattern, the first configuration favors power reduction whereas convolutional networks,” 2017, arXiv preprint arXiv:1609.02907.
the second targets throughput optimization. The resulting NoC [13] A. Varga, “Using the omnet++ discrete event simulation system in
configurations exhibit competitive performances over naive education,” IEEE Trans. on Educ., vol. 42, no. 4, pp. 11 pp.–, 1999.
[14] Y. Ben-Itzhak, E. Zahavi, I. Cidon, and A. Kolodny, “Hnocs: Modular
homogeneous configurations. For instance, the Conf. P design open-source simulator for heterogeneous nocs,” in International Confer-
reduces the power consumption and the silicon area by respec- ence on Embedded Computer Systems (SAMOS), 2012, pp. 51–57.
tively 24% and 25%, while incurring 22% of throughput loss [15] A. B. Kahng et al., “Orion3.0: A comprehensive noc router estimation
tool,” IEEE Embedded Systems Letters, vol. 7, no. 2, pp. 41–45, 2015.
compared to the Medium NoC. Considering the Conf. T NoC, [16] F. Chollet et al., “Keras,” [Link] 2015.
it provides the best possible throughput (equal to the Big NoC), [17] D. Grattarola and C. Alippi, “Graph neural networks in tensorflow and
with 15.2% power savings and 65.9% area reduction. keras with spektral,” 2020, arXiv preprint arXiv:2006.12138.
1144 Design, Automation and Test in Europe Conference (DATE 2022)