1 Introduction

Today, manufacturing companies are transitioning from the current age of smart manufacturing, known as Industry 4.0, to a new revolutionary wave of industry, Industry 5.0, where humans and machines are envisioned to work collaboratively [1]. The Fifth Industrial Revolution combines sustainable development goals with the digitalization aspects of Industry 4.0 through secure data transmission, bioinspired technologies, and human-centric solutions. Furthermore, Industry 5.0 envisions a creative, resilient, competitive, and socially centered industry while minimizing negative environmental and social impacts [2, 3]. The core aspects of Industry 5.0 include human centricity, sustainability, and resilience [4]. Sustainability and digitization are two significant trends that manufacturing companies must address as they impact their operations. Sustainable manufacturing aims to balance the three pillars of sustainability, environmental, social, and economic, while addressing the needs of stakeholders and achieving a competitive advantage [5, 6]. In addition, six key characteristics of sustainable production are highlighted in the literature: energy and resource efficiency, the natural environment, social justice, economic performance, worker rights, and product accountability [6]. To improve a maintenance system and support a sustainable manufacturing strategy, companies must identify key success factors that impact sustainability. Organizations that implement sustainable practices experience improved product and service quality, increased market share, and higher profits [7]. Therefore, integrating sustainability with maintenance promotes cost-effective maintenance practices. Sustainable maintenance activities significantly reduce pollution and waste while promoting employee safety. In addition, they improve the production capacity of manufacturing industries by ensuring the availability, reliability, and safety of equipment, which is vital for sustainable operations [8, 9].

A well-designed maintenance strategy has the potential to extend the lifespan of an asset and prevent unexpected failures, which can result in production losses, shipping delays, and decreased product quality. Such a strategy is essential for minimizing costs and enhancing productivity [10]. In addition, as attention to environmental problems increases, asset and product life cycle management becomes essential for sustainable manufacturing production, providing the functions necessary for society while minimizing resource consumption [11]. Despite Industry 4.0’s focus on digitization and automation, recent studies on the effects of digitalization suggest that merely constructing industrial plants with complex, non-interoperable technical systems is insufficient to improve their productivity and resilience in addressing sustainable development challenges. This has led to a focus on human-centric Industry 5.0. Humans will continue to be a vital resource for the competitiveness of manufacturers, particularly in activities that require flexibility, critical thinking, and originality [12]. In the highly automated and digitalized factories of the future, humans will perform more decision-making and problem-solving in increasingly complex socio-cyber-physical production systems, rather than performing many physical duties [4, 13]. In this context, multi-criteria decision-making (MCDM) techniques can be beneficial for involving multiple decision-makers with competing objectives to evaluate maintenance factors that influence manufacturing sustainability and reach a consensus on decision-making [14]. Furthermore, MCDM techniques offer a structured approach to tackle decision-making problems involving multiple objectives, criteria, and conflicting preferences [15].

This research introduces a decision support framework oriented around human-centered design, which integrates conventional MCDM methods with artificial intelligence (AI), specifically utilizing reinforcement learning (RL) algorithms. In an era where data availability and accessibility have led to remarkable advances in manufacturing processes, RL algorithms can process large datasets with minimal model assumptions, handle high-dimensional spaces, and consider long-term outcomes. Additionally, RL agents learn adaptively through real-time interactions with their environment. This capability improves decision-making in complex industrial situations, distinguishing them from traditional MCDM and other analytical methodologies [16]. Reinforcement learning is well-suited for sequential decision problems and expands machine learning to encompass modeling, prediction, and decision-making. The algorithm operates as a rational agent that learns to act in an uncertain environment through trial and error [17]. However, real-world decision-making processes are often complex and involve trade-offs among several, often conflicting, objectives. For multi-objective decision-making problems, authors recommend the use of a multi-objective reinforcement learning algorithm (MORL) [18, 19].

This text builds on previous research referenced in [20]. This research discusses a MORL model that integrates multiple deep Q-Networks (DQNs), a technique introduced by [21]. The proposed model, multi-criteria decision-making with deep Q-Networks (MCDM-DQN), aims to address challenges in multi-objective decision-making and leverage the capabilities of deep neural networks to tackle real-world MCDM issues. In addition, this study investigates the integration of the MCDM-DQN model into a human-centered framework, enabling company stakeholders to assess the impact of key maintenance factors on manufacturing sustainability. Although recent advances in Industry 5.0 concepts, such as sustainability, human-centeredness, and human-machine collaboration, have been significant, we have identified gaps in current research regarding the design of a comprehensive framework that incorporates these concepts and integrates AI and decision-making processes to evaluate sustainable maintenance factors and provide more valuable information for assessing manufacturing sustainability. Our study aimed to fill these gaps by addressing the following research questions: 1) How can an MORL algorithm help prioritize maintenance factors based on their sustainability impact? 2) How can stakeholders collaborate using a MORL model to tackle decision-making problems? To address these questions, this study proposes a decision support framework that integrates the best MCDM practices from the literature with the MCDM-DQN model to rank maintenance factors according to their relative importance from the perspectives of various maintenance stakeholders. The proposed approach aims to empower decision-makers in sustainable maintenance by offering a comprehensive framework that improves their understanding of the impacts of various maintenance factors. The study also aims to promote human-machine collaboration, incorporate human capabilities by considering various perspectives of maintenance stakeholders, and prioritize maintenance factors that impact sustainable manufacturing based on the organization’s operational context. In summary, the paper makes several important contributions. First, it presents a solution that integrates multiple DQNs to manage conflicting objectives and prioritize key maintenance factors to evaluate sustainability strategies in manufacturing. This approach addresses the challenges in multi-criteria decision-making without using traditional MCDM techniques. Second, a human-centered design is proposed, aligned with Industry 5.0 and emphasizing human participation in decision-making. This enables experts to input data reflecting their cognitive abilities into the MCDM-DQN algorithm and review the results. Lastly, case studies across various sustainability scenarios validate the approach’s effectiveness.

The remainder of the paper is structured as follows. Section 2 provides a review of the literature on sustainable maintenance factors, current MCDM approaches to assess sustainability, and innovative MORL algorithms. Section 3 outlines the problem statement and describes the methodology. Section 4 details the implementation of the MCDM-DQN model. Section 5 presents the experimental evaluation and discusses the results obtained. Section 6 covers the discussion and practical implications of the research. Finally, Sect. 7 concludes the paper and suggests directions for future work.

2 Related Work

This section focuses on evaluating previous work that is relevant to this investigation. First, we will address sustainable maintenance factors based on the existing literature. Next, we will discuss key state-of-the-art MCDM approaches used to assess sustainability, followed by a discussion on MORL algorithms. In this study, the literature review was conducted using Google Scholar and Scopus for our literature search. These databases allowed us to conduct a robust and comprehensive literature review. We formulated our primary research query as (“Sustainability” OR “Sustainable”) AND (“Maintenance” OR “Manufacturing”) AND (“Industry 4.0” OR “Industry 5.0”) because this study examines the contributions of maintenance to manufacturing sustainability, incorporating elements from both Industry 4.0 and Industry 5.0. In addition, complementary queries were developed to gain an understanding of existing research related to MCDM methods and cutting-edge MORL algorithms. For example: 1) (“multicriteria decision making”) AND (“sustainability”), 2) (“multi-criteria decision-making”) AND (“advances”), and 3) (“multi-objective reinforcement learning”) OR (“deep reinforcement learning”).

2.1 Sustainable Maintenance Factors

Maintenance systematically monitors, repairs, and replaces equipment to ensure its desired functionality [22]. Maintenance also refers to the actions taken to keep a system operational and functional throughout its lifecycle or to restore it to a state where it can perform its intended function [23]. The primary goal of maintenance is to eliminate activities that result from equipment failure or human error. These maintenance efforts will impact overall production processes, affecting multiple elements that are essential for ongoing performance enhancement, such as product quality and delivery precision [24]. Efficient maintenance improves value by optimizing resource utilization, improving product quality, and minimizing rework and waste. Companies must identify factors that impact sustainability performance based on their unique processes, business demands, and goals to address sustainability challenges effectively [6]. Manufacturing industries aiming at sustainability face the challenge of achieving sustainable maintenance. This requires balancing economic, environmental, and social factors by managing the financial costs associated with repairs and supplies, while also considering greenhouse gas emissions, energy consumption, and the health and safety of workers [7]. Historically, financial indicators were used to describe business performance in manufacturing enterprises. Companies now take a comprehensive approach to sustainability factors, including societal variables, to improve economic and non-monetary outcomes [24]. Manufacturing enterprises must evaluate their economic, social, and environmental effects to achieve sustainable development goals. They should provide a return on investment and minimize their environmental impact [25, 26]. In [27], the authors conducted a study addressing the role of maintenance in promoting industrial sustainability, considering the perspectives of different stakeholders. In addition, they proposed a conceptual framework to help various stakeholders assess the effect of maintenance on the three dimensions of sustainability.

In [28], the authors introduce a conceptual framework designed to assess how maintenance affects sustainability, providing an overview of the influence that maintenance tasks have on all sustainability dimensions. This framework helps decision-makers across various company departments recognize these impacts. In [6], a literature review and participation with industry professionals identified ten maintenance factors to tackle the challenges of sustainable manufacturing from a tactical perspective. After further analysis to determine the relationships between these factors, group them into clusters and apply MCDM techniques to rank the most critical factors. The authors found that technical factors, such as the implementation of preventive and prognostic service methods, the use of maintenance and operation data collection and processing systems, and the modernization of machines and devices, are key to addressing sustainable manufacturing. The authors of [26] evaluated the performance of sustainable development in three dimensions: business, environment, and social. The following key factors were examined for each dimension: return on investment, raw material consumption, and turnover ratio. They concluded that enhancing an enterprise’s sustainable development performance requires a blend of factors to achieve optimal results. The authors of [8] examined the factors that influence sustainable maintenance practices in manufacturing companies. Their study ranked these factors according to their interdependence. The key factors identified include availability rate, government regulations, and the importance of training and education. The study also reveals that energy consumption is affected by all the other factors considered. Conversely, changes in certain factors can impact dependent factors, such as product quality and technology.

In Industry 4.0, industrial systems can monitor processes and make intelligent decisions through real-time connections with people, machines, sensors, and other factors. The transition to Industry 5.0 affects all levels of the organization, including maintenance [29]. According to [4], future research in the field of Industry 4.0/5.0 maintenance and sustainability should investigate sustainable maintenance while considering all three pillars of sustainability simultaneously. To ensure sustainable maintenance procedures and systems, new frameworks, techniques, and performance indicators must be developed, considering these interdependencies.

2.2 State-of-the-Art MCDM Approaches for Assessing Sustainability

The process of making decisions is complex, taking into account various factors to reach a desired result. Multi-criteria decision-making methods provide systematic approaches to tackle decisions that involve multiple criteria (or objectives) and diverse preferences. These approaches facilitate the formulation of optimal solutions that are consistent with the preferences of decision-makers [30]. There are many classical MCDM methods available, such as the Analytical Hierarchy Process (AHP) [31], fuzzy AHP (F-AHP) [32], Technique for Order of Preference by Similarity to Ideal Solutions (TOPSIS) [33], Elimination and Choice Translating Reality (ELECTRE) [34], Preference Ranking Organizing Method for Enrichment Evaluation (PROMETHEE) [35]. These methods help organize decision-making by prioritizing criteria, ranking alternatives, and forming preference relationships. According to [15], the importance of MCDM methods is derived from their ability to address the complexities inherent in decision-making processes with various objectives, criteria, and stakeholders. MCDM approaches help decision-makers to solve challenges in multiple domains, such as business and management, engineering and technology, environmental decision-making, and sustainability assessments [4]. The authors of [36] proposed two approaches based on the PROMETHEE and TOPSIS techniques to prioritize sustainability strategies in the maintenance planning of a cement plant. The study revealed that optimizing maintenance policies and consumables represents the most effective sustainable maintenance solutions. In [37], the authors introduced a hierarchical method to assess sustainability in uncertain environments. This approach employs fuzzy logic to manage uncertainty, allowing decision-makers to articulate their subjective preferences using linguistic variables. The authors of [38] introduced a multi-criteria approach to evaluate wine production through the lens of the triple bottom line. The AHP method was used to help producers understand the significance of each dimension. They noted that the environmental dimension was the most crucial factor. The authors of [39] employed PROMETHEE in an environmental impact assessment to evaluate and rank alternative actions based on sustainability criteria. According to [15], the AHP and TOPSIS MCDM methods prioritize sustainability goals and enhance decision-making processes. The AHP and TOPSIS methods are applied to evaluate suppliers based on quality, pricing, delivery, and sustainability. These approaches have also been used to help balance cost, service levels, and sustainability objectives [40, 41].

A significant advancement in MCDM is the adoption of multi-objective approaches. Unlike traditional methods that typically focus on a single objective or combine multiple criteria, multi-objective MCDM addresses conflicting goals. This allows decision-makers to identify Pareto-optimal solutions that effectively balance various criteria [15]. Another major development is the integration of fuzzy logic techniques in MCDM. Fuzzy-based MCDM approaches effectively manage ambiguity and uncertainty, offering decision-makers realistic support through fuzzy criteria and linguistic evaluations [42, 43]. Furthermore, the combination of MCDM with data-driven techniques has been explored. Integrating machine learning algorithms with MCDM methods improves prediction accuracy and decision support. Examples of this integration include the use of neural networks, decision trees, and support vector machines [15]. Moreover, a notable advancement in the industry is the hybrid MCDM method. These hybrid techniques integrate multiple decision-making approaches to address problems, capitalizing on their advantages and reducing limitations. For example, the authors in [44] proposed a hybrid approach that merges the TOPSIS method with decision trees to identify suppliers in supply chain management. In [45], two hybrid MCDM systems integrate additive ratio assessment with TOPSIS and complex proportional assessment to solve a real-time robot selection problem involving 12 alternative robots and five selection criteria. The authors of [46] proposed a hybrid fuzzy MCDM method to address supplier selection issues that are sustainable and resilient.

Recent developments in MCDM techniques include the integration of AI and ML into decision-making processes, the development of dynamic decision support systems (DSSs), and the enhancement of DSSs through multi-agent digital twins. Employing AI and ML strategies, such as neural networks, evolutionary algorithms, and reinforcement learning, can improve the effectiveness and precision of DSS in MCDM. [15]. According to [47], a conventional DSS can be improved with AI by managing large amounts of data and various constraints within the decision-making environment. This strategy for developing DSSs can increase accuracy, learning, and prediction. A dynamic decision support system for sustainable supplier selection in circular supply chains is proposed by [48]. The proposed approach integrates machine learning and fuzzy inference algorithms, enabling customers to customize and prioritize their criteria before choosing the best supplier. The authors of [49] proposed a dynamic DSS to evaluate cloud storage providers based on their offered services and select a suitable one. Lastly, multi-agent systems (MAS) are a branch of distributed AI in which autonomous agents collaborate to achieve a common goal. MAS is essential for developing Digital Twins (DTs). It enables the parallel and distributed computation required to process large amounts of data and complex simulations typically associated with DTs [50]. The authors of [51] proposed a hierarchical decision-making framework that uses DT technology to optimize resource usage and facilitate real-time mission planning for multiple UAVs, employing DT-enabled reinforcement learning to leverage real-world experiences. A digitally enabled DSS that supports intelligent and tailored supplier management in dynamic and challenging environments is presented by [52]. The system employs fuzzy stratified decision-making to assess the potential impact of future events. Additionally, it utilizes multi-agent digital twin technology to validate the effectiveness of supplier development plans.

2.3 Cuting-Edge MORL Algorithms

Reinforcement learning is an effective paradigm for sequential decision-making tasks, where the decision-maker observes a process before making a final decision. Unlike traditional control methods, RL makes minimal assumptions about the problem, allowing easy adaptation to any task assignment [53]. Traditional RL methods that excel at tasks such as playing video games focus on single-objective techniques, such as deep Q-learning [21, 54]. In contrast, real-life problems usually involve simultaneously meeting multiple, often conflicting objectives [55]. Therefore, optimal policies should arise from these methods [56]. However, the number of optimal policies may vary for multi-objective problems, depending on the trade-offs involved in achieving different objectives [57]. Therefore, MORL methods seek to create a policy coverage set to address all possible user preferences in solving the problem [53]. In [58], DQN algorithms were modified to facilitate single-policy linear MORL by creating an approximate coverage set of policies. Each policy was represented with a neural network that uses an outer-loop strategy. The algorithms’ effectiveness was evaluated using two MORL benchmark scenarios: Mountain Car (MC) and Deep Sea Treasure (DST). The author of [57] introduced a method for utilizing DQNs in multi-objective environments where various DQNs guide the agent’s behavior toward specific goals. The proposed method used decision values to enhance the scalarization of multiple DQNs into a single action. It was tested on a game-like simulator in which an agent with visual input pursues numerous goals. In [59], an algorithm was proposed to learn a single-policy network optimized across the entire preferences space within a domain, generating the optimal policy for any user-specific preference. This showcases the effectiveness of deep neural networks in scaling MORL with linear preferences. The authors’ approach was evaluated using popular MORL benchmarks, such as DST and the video game Super Mario Bros. The authors of [60] proposed a DQN and extended it to multi-objective scenarios, maximizing drug-likeness while maintaining molecular similarity. A multi-objective DQN method was used for autonomous driving, which requires the consideration of various factors simultaneously, including obeying traffic rules, avoiding collisions, reaching the destination as quickly as possible, and ensuring passenger safety [61].

The authors of [62] reported that a Pareto set of various policy families effectively represents the optimal performance trade-offs for multi-objective robot control. A prediction-guided evolutionary learning algorithm was used to identify high-quality policies within the Pareto set to calculate these representations. A MORL environment with a continuous action space was created to validate the effectiveness of the algorithm. The authors of [63] presented a DQN-based MORL framework that supports single-policy and multi-policy techniques and linear and non-linear action selection approaches. The proposed method was validated in two benchmark environments: a two-objective DST problem and a three-objective MC problem. The experimental results showed that the framework could find Pareto-optimal solutions effectively. The study presented in [64] unveiled the Q-Managed algorithm, a MORL approach proficient in acquiring non-dominated multi-objective policies even when faced with deterministic transition functions, regardless of the Pareto front’s configuration. In [65], the authors addressed the difficulties in scheduling energy-efficient automated guided vehicles that require battery changes within a production logistics framework. They introduce an innovative data-driven method that employs a deep reinforcement learning-based agent to orchestrate AGV tasks and devise battery replacement strategies in response to real-time service demands. The authors of [18] propose an intelligent energy management strategy for multi-microgrid power systems that utilizes multi-objective reinforcement learning methods based on preferences to create a Pareto-optimal set for every objective, thus maximizing its advantages. A non-dominated solution involves crafting a plan that maintains balance without giving undue advantage to any stakeholder. The method successfully captures diverse preferences, as demonstrated by preference-based outcomes. The authors of [66] proposed a MORL approach for energy management strategies. This approach aims to balance hydrogen consumption in fuel cell vehicles, the fuel cell’s durability, and the battery’s lifespan, ultimately reducing the overall lifecycle costs associated with fuel cells. A novel technique for optimizing pressurized water reactors, the Pareto envelope augmented with reinforcement learning, was introduced by [67] to tackle the complexities of multi-objective problems, especially in engineering, where evaluating possible solutions can be time-consuming.

2.4 Novelty and Research Gap

Although the diligent work and profound insights of the authors have significantly advanced our understanding of the importance of MCDM methods in various fields, specialized methodologies are needed that can effectively address the challenges and demands of Industry 5.0. Planning for the future requires identifying directions and developing trends in MCDM research. For example, integrating MCDM with emerging technologies such as AI, cyber-physical systems, and the Internet of Things has significant potential to improve decision-making processes [15, 68]. The literature outlines various constraints and limitations of current MCDM approaches, as detailed in Table 1. This study aims to address existing challenges and gaps by proposing a decision support framework that integrates AI and MCDM techniques to rank and prioritize sustainable maintenance factors to help stakeholders evaluate manufacturing sustainability. The proposed approach can also analyze large datasets and facilitate real-time decision-making capabilities. Additionally, this study enhances the decision-making abilities of the suggested MCDN-DQN model by promoting human collaboration through data input that reflects diverse perspectives of maintenance stakeholders while reviewing and verifying the model’s output. Furthermore, this research paper advocates for a human-centered approach that aligns with the vision of Industry 5.0, promoting stakeholder engagement and decision transparency. It emphasizes the active participation of stakeholders in the decision-making process and seeks to incorporate their preferences without relying on traditional MCDM methods.

Table 1 Limitations and constraints of current MCDM techniques
Table 2 MCDM decision matrix

3 Problem Statement and Methodology

3.1 MCDM Problem

The MCDM problem is a process for selecting the best option from a set of alternatives [30]. Mathematically, it is defined as follows:

$$\begin{aligned} A = \{A_i \mid i = 1, 2, \dots , m\} \end{aligned}$$
(1)

where A represents a unique and finite collection of m alternatives.

$$\begin{aligned} C = \{C_j \mid j = 1, 2, \dots , n\} \end{aligned}$$
(2)

where C denotes a set of n criteria used to evaluate A, the criteria may have diverse, unrelated units and present varied, potentially conflicting goals. Minimization may be preferred in some cases, while maximization is favored in others.

$$\begin{aligned} W = \{w_j \mid j = 1, 2, \dots , n\} \end{aligned}$$
(3)

where W denotes a set of weights, each normalized and allocated to a criterion based on its importance. Typically, the data collected in Eqs. 1 to 3 is structured into a matrix known as the decision matrix \(D_m\), which includes the alternative ratings concerning each criterion, as shown in Table 2. In the decision matrix, each element \(x_{ij}\) represents the judgment of \(A_i\) with respect to Cj. The matrix \(D_m\) and the weight set W are the basic inputs for the MCDM methods, which score the alternatives and rank them according to their priority.

3.2 MORL Problem

The Markov decision process (MDP) is a framework for sequential decision-making in various scenarios. In MDPs, agents act as both learners and decision-makers by interacting with their environment. The agent chooses actions that lead to different outcomes while receiving rewards, which are numerical values that the agent seeks to maximize over time [16]. A MORL model is structured as a multi-objective Markov decision process (MOMDP) [19, 78]. In a multi-objective problem consisting of \(n > 2\) objectives, the MOMDP is described as a tuple \(\langle S, A, T, R, \gamma \rangle \), where S is the state space, A is the action space, \(T: S \times A \times S \rightarrow [0,1]\) is a probabilistic transition function, \(R: S \times A \times S' \rightarrow \mathbb {R}^n\) is the vector of reward functions corresponding to the considered n objectives, and \(\gamma \in [0,1]\) is the discount factor that influences future rewards. The key distinction between MOMDPs and standard MDPs lies in the reward signal. In MOMDPs, the reward function does not provide a single scalar value. Instead, it returns an n-dimensional vector that represents a numeric reward signal at each time-step t for each of the objectives n, as demonstrated in Eq. 4.

$$\begin{aligned} \vec {R_t} (s, a, s') = (R_{1,t}(s, a, s'), R_{2,t}(s, a, s'), \dots , R_{n,t}(s, a, s')) \end{aligned}$$
(4)

To address a reinforcement learning problem, it is essential to identify a policy that consistently generates significant rewards over time [16]. In multi-objective contexts, the definitions of reinforcement learning change when using a vectorial reward function. The agent’s goal is to take actions that ensure that its expected vector return \(\vec {R}_t\), achieves Pareto optimality [18, 63, 79]. The expected vector return is defined as follows:

$$\begin{aligned} \vec {R}_{t} = \sum _{k=0}^{T - t} \gamma ^k \vec {R}_{t+k+1} \end{aligned}$$
(5)

where \(T < \infty \) for episodic tasks, while for continuous tasks, \(T = \infty \).

In MDPs, value functions estimate the expected cumulative future rewards that an agent can achieve from a specific state while following a policy \(\pi \). The state value function for MOMDP \(\vec {V}^\pi (s)\) is also vector-valued, \(\vec {V}^\pi \in \mathbb {R}^n\). This function specifies how good a certain state s is in the long term according to policy \(\pi \in \Pi \) as follows:

$$\begin{aligned} \vec {V}^\pi (s) = \mathbb {E}_\pi \{\vec {R}_t \mid S_t=s\} = \mathbb {E}_\pi \{\sum _{k=0}^{T-t} \gamma ^k \vec {R}_{t+k+1} \mid S_t=s\} \end{aligned}$$
(6)

where \(T < \infty \) for episodic tasks, while for continuous tasks, \(T = \infty \).

The state action value function \(\vec {Q}^\pi (s,a)\) that specifies the expected rewards starting from state s, taking arbitrary action a, and then following policy \(\pi \) is also vector-valued, as follows:

$$\begin{aligned} \vec {Q}^\pi (s,a) = \mathbb {E}_\pi \{\vec {R}_t \mid S_t=s, A_t = a\} = \mathbb {E}_\pi \{\sum _{k=0}^{T-t} \gamma ^k \vec {R}_{t+k+1} \mid S_t=s, A_t=a\} \end{aligned}$$
(7)

where \(T < \infty \) for episodic tasks, while for continuous tasks, \(T = \infty \).

In MDPs, \(V^\pi (s)\) provides comprehensive ordering throughout the policy space. For any \(\pi \) and \(\pi \prime \), either \(V^\pi (s) \ge V^{\pi \prime } (s)\) or \(V^\pi (s) < V^{\pi \prime } (s)\). Therefore, the optimal policy \(\pi ^*\) is equivalent to maximizing the expected discounted reward, which is not always true for MOMDPs. When multi-objective problems are approached using a single-policy method, a utility function \(u: \mathbb {R}^m \rightarrow \mathbb {R}\) is used to map the vector-valued state value function to a scalar value \(V^\pi _u = u(V^\pi )\), the result would be a total ordering of policies, thus reducing the MOMDP to a single-objective decision-making problem, which is not always desirable [78]. This study employs a multi-policy approach to tackle MOMDP problems. The aim is to identify a set of optimal policies, denoted as \(\pi ^*= \{\pi _1,\pi _2, \dots , \pi _n \}\), commonly referred to as the Pareto front. This set of policies aims to satisfy all user preferences related to the defined objectives. For those interested, the work of [78] provides a comprehensive discussion of the multi-objective sequential decision problem.

Fig. 1
Fig. 1
Full size image

Human-centered design for assessing factors relevant to manufacturing sustainability

3.3 Methodology

This section introduces the decision support framework depicted in Fig. 1. The proposed framework integrates MCDM techniques with the MCDM-DQN model to address the challenge of ranking maintenance factors according to their relative importance from the perspectives of various decision-makers. The suggested approach is aligned with the concept of Industry 5.0. The objective is to help stakeholders understand how key maintenance factors influence manufacturing sustainability by incorporating three modules: sustainable maintenance support, MCDM-DQN model, and decision-making support. It also covers seven stages, as described in Table 3. These stages adhere to the AI development lifecycle defined in [80]. It is essential to note that while the solution presented in Fig. 1 provides a comprehensive framework to support sustainable maintenance decision-making, the discussion and case studies primarily focus on the development of the MCDM-DQN model, which will be explored in subsequent sections.

Table 3 The seven stages included in the proposed framework

The role of maintenance in achieving sustainability goals in manufacturing varies depending on a company’s operational and business context. To address this issue, the sustainable maintenance support module encompasses the three initial stages outlined in Table 3, allowing stakeholders to effectively assess the impact of maintenance factors on manufacturing sustainability. In the first stage, decision-makers identify key factors that are significant to their businesses. The second stage establishes the assessment criteria C and their respective weights W while selecting the essential factors. In the third stage, decision-makers evaluate the key factors associated with each criterion and create a decision matrix \(D_m\). Finally, \(D_m\) and W serve as fundamental inputs for the MCDM-DQN. In the evaluation stage, maintenance stakeholders share their perceptions to assess the importance of various maintenance factors. According to [6], specialists from various departments (including production, quality, maintenance, safety, health, and environment) should collaborate in teams to evaluate maintenance factors. This approach helps minimize bias among decision-makers and identify discrepancies in how different individuals perceive maintenance factors based on established criteria. It is important to note that this research does not advocate any specific method of execution. However, several studies referenced in this research apply fuzzy logic in the evaluation process. Specifically, a fuzzy entropy-weighted approach is used to compute weights, while triangular fuzzy numbers are used to evaluate various sub-criteria related to maintenance sustainability strategies [36]. In addition, linguistic variables and fuzzy numbers are used to form the decision matrix and evaluate the significance of the criteria [6, 36, 68]. Fuzzy-based multi-criteria decision-making techniques aid in managing ambiguity and uncertainty in decision-making processes. They enable decision-makers to express their subjective preferences using linguistic variables. This approach fosters effective collaboration among multiple decision-makers, enabling them to reach a consensus and evaluate alternative options. [37, 42, 43].

The MCDM-DQN module implements the MCDM-DQN algorithm, which is the core of the proposed solution. It contributes to the purpose of the overall framework by ranking key sustainable maintenance factors, enabling stakeholders to select the optimal factor from a prioritized set of relevant factors. Developing the model requires an intensive, iterative, and exploratory approach. Due to inherent uncertainty in the timing and quality of outcomes in AI development, frequent loops, ongoing modifications, and improvements to initial requirements are essential. In this research, the performance of the MCDM-DQN model is compared to that of other traditional MCDM techniques through three experiments involving the ranking of sustainability factors in different business contexts.

To enhance the evaluation and deployment of the model, the framework should be integrated with production systems to facilitate the testing of MCDM-DQN using unseen data. Stakeholders should assess whether the model’s quality, stability, and performance meet expectations and determine its deployment. It is essential to emphasize that stakeholder participation in decision-making is significant because their influence, experience, and insights into the company can help decision-makers make informed decisions to achieve sustainable development goals. Understanding stakeholders’ interests and needs is essential for optimizing decision-making strategies.

Fig. 2
Fig. 2
Full size image

The MCDM-DQN model combines multiple DQNs

4 The MCDM-DQN Model

The MCDM-DQN model depicted in Fig. 2 combines multiple DQNs to implement a MORL solution in which the agent pursues multiple objectives. Let \(O = \{o_1, o_2, \cdots , o_n\}\) be the set of objectives of an agent. Instead of a single reward, the agent receives a vector of rewards \(\vec {r_t} =[r_{1,t}, r_{2,t}, \dots , r_{n,t}]\) at each time-step t where \(r_{i,t}\) corresponds to objective \(o_i\). For each objective \(o_i\) and time-step t, the discounted return is defined by Eq. 8.

$$\begin{aligned} R_{i,t} = \sum _{k=0}^{\infty } \gamma ^k r_{i,t+k} \end{aligned}$$
(8)

Additionally, for each objective \(o_i\) there is a Q-function \(Q_i (s,a)\) that produces the expected discounted return \(R_{i,t}\), such that \(Q_i (s,a) = \mathbb {E}[R_{i,t}|S_t = s,A_t = a]\). Then, a vector of Q-functions, which includes Q(sa) for each objective \(o_i\) can be defined:

$$\begin{aligned} \vec {Q} (s,a) = [Q_1(s,a), Q_2(s,a), \dots , Q_n(s,a)] \end{aligned}$$
(9)

The agent can determine the optimal action related to the objectives \(o_i \in O\), in time-step t, for a given state \(s_t\), via the function \(Q_i(s, a)\) as follows:

$$\begin{aligned} a_{i,t} = \underset{a}{\text {argmax}} Q_i(s_t, a) \end{aligned}$$
(10)

The vector \(\vec {a_t} = [a_{1,t}, a_{2,t}, \dots , a_{n,t}]\), which consists of optimal actions relating to objectives in a given time-step t, can also be determined. However, it is necessary to convert \(\vec {a_t}\) into a single action, as the agent may perform only one action at each time step. Therefore, a linear scalarization function that combines \(\vec {Q} (s,a)\) into a single action given a weight vector \(\vec {W} = [w_1, w_2, \dots , w_n]\) is applied as follows:

$$\begin{aligned} SQ(s,a) = \sum _{i=1}^{n} w_i Q_i(s,a) \end{aligned}$$
(11)

where \(w_i\) is the weight assigned to each specific objective \(o_i\), and n represents the total number of objectives. Therefore, SQ(sa) can be applied to Eq. 10 to select an action.

The MCDM-DQN agent uses a different DQN as an approximator for each \(Q_i (s, a) \in \vec {Q}(s, a)\). Each DQN provides a vector of q-values combined to select a single action that the agent performs. The q-values obtained from the DQNs for different objectives are merged using the weights \(w_i\) defined by the manufacturing experts. Consider a vector \(\vec {q_i}\) that consists of q-values provided by \(Q_i(s, a)\) for each possible action \(a_i \in A\) and a single-objective \(o_i \in O\) defined as follows:

$$\begin{aligned} \vec {q_i} = [Q_i(s,a_1), Q_i(s,a_2), \dots , Q_i(s,a_m)]. \end{aligned}$$
(12)

In this approach, scalarization can sum all q-values and select the action corresponding to the maximal value in the scaled q-value vector. Since the q-values may be real numbers, each vector must be rescaled to \([0,1] \subseteq \mathbb {R}\) to produce meaningful results. In our experiments, the scikit-learn library function preprocessing.MinMaxScaler() was used to rescale the values in \(\vec {q_i}\) as follows:

$$\begin{aligned} scale(x) = \frac{x - x_{\min }}{x_{\max } - x_{\min }} \end{aligned}$$
(13)

The scalarized Q-vector is defined by Eq. 14, where n is the number of objectives and \(w_i\) corresponds to each objective weight. Finally, we can sum the vectors via the rescaled \(\vec {q_i}\) and select an action corresponding to the highest total q-value.

$$\begin{aligned} \vec {q_s} = \sum _{i=1}^{m} w_i scale(\vec {q_i}) \end{aligned}$$
(14)

4.1 MCDM-DQN Learning

Deep Q-network algorithms utilize neural networks parameterized by \(\theta \) to approximate the function \(Q(s, a,\theta )\), where \(\theta \) refers to the learnable parameters of the neural network. In our MCDM-DQN model, multiple DQNs are used. Therefore, there is a function \(Q_i(s, a,\theta _i)\) for a \(DQN_i\) with respect to the objective \(o_i\). Each \(DQN_i\) is optimized by minimizing a loss function in each iteration j, with the target \(y^{DQN}_j = r + \gamma \underset{a'}{\max }\ Q(s', a'; \theta ^-_j)\) as follows:

$$\begin{aligned} \mathcal {L}_j(\theta _j) = \underset{s, a, r, s'}{\mathbb {E}} \bigr [(y^{DQN}_j - Q(s,a;\theta _j))^2 \bigr ] \end{aligned}$$
(15)

It is important to understand that each DQN involves two identical multi-layer feed-forward networks during the learning process. The first is the online network \(Q_i(s, a,\theta )\), which is updated at each iteration j. The second is the target network \(Q_i(s, a; \theta ^-\), which is updated only at k iterations [81]. Each DQN is implemented as a subclass of PyTorch’s neural networks, specifically “torch.nn.Module” [82]. The neural network architecture includes two layers with ReLU activation followed by an output layer, using default parameters, with each layer set to 128 units.

Furthermore, experience replay [83] improves the learning process by using training experiences more effectively. Replay memory retains experience samples for re-utilization during training. Each agent logs the states, actions, and rewards encountered within the replay memory \(M_i\) specific to each \(DQN_i\). Training involves sampling past experiences uniformly at random from this replay memory. The selected samples serve as mini-batches for the gradient descent optimization process. Each \(DQN_i\) is optimized iteratively using a loss function for each iteration j as follows:

$$\begin{aligned} \mathcal {L}_{i,j}^Q = \underset{s, a, r, s'}{\mathbb {E}} \sim U(M_i) \bigr [(r_i + \gamma \underset{a'}{\max }\ Q_i(s',a'; \theta _{i,j}^-) - Q_i(s,a;\theta _{i,j}))^2 \bigr ] \end{aligned}$$
(16)

5 Experimental Evaluation

The following sections describe the implementation of the MCDM-DQN environment and the experiments carried out to evaluate the performance of the model.

5.1 The MCDM-DQN Environment

To evaluate the performance of MCDM-DQN, we created a virtual environment named SustainableIndicators. This environment simulates how the key maintenance factors chosen by stakeholders affect the behavior of various sustainability indicators. These indicators align with common sustainability objectives in manufacturing, including environmental, social, health and safety, and economic. The environment consists of a continuous four-dimensional state within the range of [0, 100]. A stochastic oscillation is provided and used to model seemingly random changes in manufacturing sustainability. The initial state \(s_0\) is randomly chosen, determining the starting level \(L_i\) for each objective \(o_i\) within the interval [0, 100]. At each time step, \(L_i\) is modified by \(L_{i, step}\), increasing if associated with a benefit criterion and decreasing if linked to a cost criterion. The term \(L_{i, step}\) is calculated as follows: for a cost criterion \(L_{i, step} = (100 - L_i) \times v\), and for a benefit criterion, \(L_{i, step} = L_i \times v\), where v is a random variable within the range [0.0001, 0.001]. Consequently, these conflicting objectives continuously evolve at each time step. Furthermore, the MCDM-DQN agent performs various actions related to the alternatives presented in the decision matrix \(D_m\) that decision-makers input into the MCDM-DQN model. The goal is to minimize objectives related to cost criteria while maximizing those related to benefit criteria. When an action is executed, the algorithm searches for the corresponding alternative ratings \(a_{I, j}\) for each objective \(o_i\) and adjusts \(L_i\) by adding or subtracting a value corresponding to \(a_{i, j} \times 0.1\). Our environment utilizes the MO-Gymnasium API, an open-source Python library, to develop and compare multi-objective reinforcement learning algorithms [84]. It also conforms to the standard Gymnasium API [85] but returns vectorized rewards instead of scalars.

The MCDM-DQN agent is designed to perform episodic tasks. At each time interval, it interacts with the environment by observing the current state and selecting an action to execute using a \(\epsilon \)-greedy policy. The MCDM-DQN agent dynamically adjusts strategies by balancing exploitation and exploration to maximize rewards, as shown in Algorithm 1. After acting, the agent receives a vectorized reward signal and transitions to a new state. The reward values for the objectives related to the benefit criteria are established at \(-1\) for lower performance levels in the range [0, 30], 0 for intermediate performance levels in the range (30, 70), and 1 for higher performance levels in the range [70, 100]. The same reward values and performance level ranges apply to the objectives related to cost criteria, but in reverse order: higher performance levels in the range [0, 30] and lower performance levels in the range [70, 100]. Each episode ends when one of two final states is reached: 1) a balanced score of objectives shows cost criteria within the range of [0, 10] and benefit criteria in the range of [90, 100], or 2) a balanced score indicates cost criteria between [90, 100] and benefit criteria within [0, 10]. These scenarios yield a reward signal of 100 and -100, respectively. Furthermore, various hyperparameters were employed to train DQNs throughout the learning process, as detailed in Table 4.

Table 4 MCDM-DQN learning hyperparameters
Algorithm 1
Algorithm 1
Full size image

MCDM-DQN performance evaluation algorithm

5.2 Experiments

This section describes various experiments conducted to assess the performance of the MCDM-DQN model. The SustainableIndicators environment was modified to optimize the state and action spaces for each specific MCDM problem. In each experiment, the MCDM-DQN model was trained according to the specifications described in Sect. 4.1. All DQNs used for the various objectives were trained independently, and the criteria weights were not utilized for scalarization during training. Four distinct scenarios were executed, each with different sets of criteria weights. In the base scenario, input weights W were utilized. Subsequent scenarios used different preference sets, which were randomly assigned, to evaluate the sensitivity of the rankings to variations in the criteria weights. A total of 1000 episodes were played for each scenario. The accumulated rewards earned for each action taken were recorded in each scenario. The method used to assess the performance of MCDM-DQN is described in Algorithm 1. The authors developed the necessary software to implement the solutions and perform the experiments presented in this study using multiple well-known and updated tools available for deep learning and RL implementations, such as Python 3.10.14, PyTorch, and OpenAI Gym environments running on Windows 11 for x64-based systems. The system used was an x64-based PC with an 11th-generation Intel Core i7 processor operating at 3.30 GHz, featuring 16 GB of RAM and 1 TB of storage.

Table 5 Sustainable software engineering practices

5.2.1 Experiment 1: Assessing the Sustainability of a Software Development Company

In this experiment, we compared the performance of the MCDM-DQN with a benchmark established by [68]. His study employed fuzzy TOPSIS and AHP methods to assess sustainability practices in a software (SW) development company operating in Industry 5.0. The goal was to identify the most sustainable SW engineering practices while addressing various sustainability concerns: social, economic, and environmental. Four assessment criteria were established: \(C_1\) for environmental impact, \(C_2\) for social responsibility, \(C_3\) for resource efficiency, and \(C_4\) for economic viability. Next, five alternative SW engineering practices were identified (Table 5).

Fig. 3
Fig. 3
Full size image

Average episodic rewards obtained by MCDM-DQN during Experiment 1. The learning of each DQN is stable over time: (a) DQN 1, (b) DQN 2, (c) DQN 3, and (d) DQN 4

Fig. 4
Fig. 4
Full size image

Ranking of five actions that represent alternative sustainable SW engineering practices: (a) base scenario: \(W = \{0.25, 0.25, 0.25, 0.25\}\), (b) scenario 2: \(W = \{0.30,0.15,0.20,0.35\}\), (c) scenario 3: \(W = \{0.15,0.20,0.35,0.30\}\), and (d) scenario 4: \(W = \{0.20,0.35,0.30,0.15\}\). The scores on the y-axis represent the total reward value normalized by the log of y

The experiment was carried out as follows. First, the decision matrix and the weights of the four criteria presented in [68] were defuzzified and served as input to the MCDM-DQN model. The training of each DQN was consistently stable over time, as shown in Fig. 3. To indicate computational effort, the average training execution time for each DQN was recorded as follows: a) DQN 1: 1.28 min; b) DQN 2: 1.47 min; c) DQN 3: 1.24 min; and d) DQN 4: 1.26 min. The trained MCDM-DQN model was finally used to rank sustainable SW practices.

The experimental results are illustrated in Figs. 4 and 5. Figure 4 shows the ranking order of the alternatives based on the accumulated rewards, with \(A_1\) and \(A_3\) performing the best in the base scenario, followed by \(A_2\), \(A_5\), and \(A_4\). The dominant sustainability practices \(A_1\) and \(A_3\) remained in their positions in all scenarios, suggesting that they should be considered equally important with such differences in the criteria weights. The alternatives \(A_2\), \(A_5\), and \(A_4\) changed their ranking in different scenarios, indicating that they are more sensitive to fluctuating preferences. The rewards accumulated for each action in the base scenario are presented in Fig. 5. The total accumulated rewards determine the ranking of each alternative. The results obtained in this experiment are consistent with the findings in [68]. A comparison analysis of the ranking of SW engineering practices obtained in this experiment and the findings of [68] is presented in Table 6. As we can see, the MCDM-DQN performance could match that of fuzzy TOPSIS. Specifically, the findings in [68] show that the performance differences between the TOPSIS and AHP methods suggest that alternative scores may vary depending on the MCDM method applied (Fig. 6).

Fig. 5
Fig. 5
Full size image

The total accumulated rewards used to rank five sustainable SW engineering practices in the base scenario: (a) practice \(A_1\), (b) practice \(A_2\), (c) practice \(A_3\), (d) practice \(A_4\), (e) practice \(A_5\), and (f) total accumulated reward

Table 6 A comparative analysis of the ranking of sustainable SW engineering practices obtained in experiment 1 and the findings of [68]

5.2.2 Experiment 2: Evaluation of Maintenance Factors that Affect Sustainable Manufacturing

In this experiment, we compared the performance of the MCDM-DQN with a benchmark established by [6], which presents an empirical examination of the impact of the maintenance function on manufacturing sustainability in 58 companies. The most relevant maintenance factors were determined (Table 7). Then, F-AHP and TOPSIS methods were employed to evaluate the maintenance factors that influence sustainable manufacturing. The F-AHP method facilitated the participation of multiple decision-makers with different and conflicting criteria, promoting consensus and establishing the weights of these criteria [6]. Four assessment criteria were established: \(C_1\) for manufacturing cost, \(C_2\) for energy consumption, \(C_3\) for waste reduction, and \(C_4\) for operational safety.

Table 7 Key factors that influence sustainable maintenance
Fig. 6
Fig. 6
Full size image

Average episodic rewards obtained by MCDM-DQN during Experiment 2. The learning of each DQN is stable over time (a) DQN 1, (b) DQN 2, (c) DQN 3, and (d) DQN 4

The experimental procedure was conducted as follows. First, the fuzzy decision matrix and the fuzzy weights for five factors, as detailed in [6], were defuzzified to serve as input for the MCDM-DQN model. The training of each DQN was consistently stable over time, as shown in Fig. 4. To indicate computational effort, the average training execution time for each DQN was as follows: a) DQN 1: 1.28 min; b) DQN 2: 1.47 min; c) DQN 3: 1.24 min; and d) DQN 4: 1.26 min. The trained MCDM-DQN model was finally used to rank the key maintenance factors.

The experimental results are presented in Figs. 7 and 8. Figure 7 shows the ranking order of the alternatives based on the accumulated rewards, with \(A_3\), \(A_4\), and \(A_5\) performing the best in the base scenario, followed by \(A_1\) and \(A_2\). In particular, the three most highly ranked alternatives are related to technical issues. Consequently, company managers must focus on these factors and implement measures in these areas to support sustainable manufacturing goals. Furthermore, the three dominant alternatives (\(A_3, A_4, A_5\)) remained in the top positions in the ranking in subsequent scenarios, suggesting that they should be considered equally important with such differences in experts’ preferences. The alternatives \(A_1\) and \(A_2\) changed their classification in different scenarios. However, they continued to occupy the lowest ranks, suggesting that changing preferences had little impact on the overall ranking. The rewards accumulated for each action in the base scenario are presented in Fig. 8. The total accumulated rewards determine the ranking of each alternative. The results obtained in this experiment are consistent with the findings in [6], which concluded that alternatives \(A_3\), \(A_4\), and \(A_5\) should be considered equally crucial after a sensitivity analysis. A comparison analysis of the ranking of the key factors obtained in this experiment and the findings of [6] is presented in Table 8. As we can see, the MCDM-DQN performance could match that of fuzzy TOPSIS.

Fig. 7
Fig. 7
Full size image

Ranking of five actions that represent alternative maintenance factors: (a) base scenario \(W = \{0.32, 0.32, 0.19, 0.17\}\), (b) scenario 2 \(W = \{0.32, 0.19, 0.17, 0.32\}\), (c) scenario 3 \(W = \{0.19, 0.17, 0.32, 0.32\}\), and (d) scenario 4 \(W = \{0.17, 0.32, 0.32, 0.19\}\). The scores on the y-axis represent the total reward value normalized by the log of y

Fig. 8
Fig. 8
Full size image

The total accumulated rewards used to rank the five key maintenance factors in the base scenario: (a) factor \(A_1\), (b) factor \(A_2\), (c) factor \(A_3\), (d) factor \(A_4\), (e) factor \(A_5\), and (f) total accumulated reward

Table 8 A comparative analysis of the ranking of key maintenance factors obtained in experiment 2 and the findings of [6]

5.2.3 Experiment 3: Ranking Maintenance Sustainability Strategies

In this experiment, we compared the performance of the MCDM-DQN with a benchmark established by [36]. Their study used PROMETHEE and fuzzy TOPSIS methods to assess maintenance sustainability strategies in a cement plant to determine the most effective maintenance sustainability strategy. Four assessment criteria were established: \(C_1\) for environmental, \(C_2\) for social and safety, \(C_3\) for technical, and \(C_4\) for economic. Next, four alternative maintenance sustainability strategies were identified (Table 9).

The experiment was carried out as follows. First, the criteria weights were calculated by averaging and aggregating the sub-criteria scores from various decision-makers presented by [36]. Then, the decision matrix was created based on the crisp values of the maintenance strategies presented in [68]. The weights calculated and the decision matrix were used as inputs for the MCDM-DQN model. The training of each DQN was consistently stable over time, as shown in Fig. 9. To indicate computational effort, the average training execution time for each DQN was as follows: a) DQN 1: 1.23 min; b) DQN 2: 1.37 min; c) DQN 3: 1.26 min; and d) DQN 4: 1.37 min. The trained MCDM-DQN model was finally used to rank maintenance sustainability strategies.

Table 9 Maintenance sustainability strategies
Fig. 9
Fig. 9
Full size image

Average episodic rewards obtained by MCDM-DQN during Experiment 3. The learning of each DQN is stable over time (a) DQN 1, (b) DQN 2, (c) DQN 3, and (d) DQN 4

The experimental results are illustrated in Figs. 10 and 11. Figure 10 shows the ranking order of the alternatives based on the accumulated rewards, with \(A_2\) and \(A_1\) performing the best in the base scenario, followed by \(A_4\) and \(A_3\). All strategies maintained their positions in all scenarios, indicating that changing preferences did not significantly impact the rankings. The rewards accumulated for each action in the base scenario are presented in Fig. 11. The total accumulated rewards determine the ranking of each alternative. The results obtained in this experiment are consistent with the findings in [36], which highlighted variations in the classification of maintenance strategies using PROMETHEE and fuzzy TOPSIS methods. Although the PROMETHEE approach suggested that the strategy \(A_2\) is the best and \(A_3\) the worst maintenance strategy, the fuzzy TOPSIS method identified \(A_1\) as the top-ranking alternative, with \(A_3\) and \(A_4\) as the least favored strategies. A comparative analysis of the ranking of sustainable maintenance strategies obtained in this experiment and the findings of [36] are presented in Table 10. We observe that the MCDM-DQN performance matches that of PROMETHEE.

Fig. 10
Fig. 10
Full size image

Ranking of four actions that represent alternative sustainable maintenance strategies: (a) base scenario \(W = \{0.24, 0.25, 0.25, 0.25\}\), (b) scenario 2 \(W = \{0.30,0.15,0.20,0.35\}\), (c) scenario 3 \(W = \{0.15, 0.20, 0.35, 0.30\}\), and (d) scenario 4 \(W = \{0.20, 0.35, 0.30, 0.15\}\). The scores on the y-axis represent the total reward value normalized by the log of y

6 Discussion

In this research, we propose a MORL model, called MCDM-DQN, to assist decision-makers in evaluating maintenance factors that influence manufacturing sustainability. The MCDM-DQN model has demonstrated its ability to rank the key factors that influence manufacturing sustainability across various industry segments while considering the perspectives of decision-makers. This research also proposes a human-centered design that places decision-makers at the core of the decision-making process. The proposed solution combines traditional MCDM techniques with AI, fostering human–machine collaboration. The proposed framework also empowers decision-makers, allowing them to input data that reflects the views and perceptions of stakeholders while evaluating and validating the model’s output.

Fig. 11
Fig. 11
Full size image

The total accumulated rewards for ranking sustainable maintenance strategies in the base scenario: (a) strategy \(A_1\), (b) strategy \(A_2\), (c) strategie \(A_3\), (d) strategy \(A_4\), (e) strategy \(A_5\), and (f) total accumulated reward

Table 10 A comparative analysis of the ranking of sustainable maintenance strategies obtained in experiment 3 and the findings of [36]

The performance of our solution was compared with several benchmarks, and the experimental results indicated that MCDM-DQN could match the performance of traditional MCDM methods, such as TOPSIS and PROMETHEE. In addition, it offers enhanced capabilities that align with the principles of Industry 5.0. Our proposed solution provides real-time feedback and employs an a posteriori approach, where decision-makers’ preferences are not used for scalarization during the learning phase. Instead, the model calculates a coverage set of policies, enabling a quick response when more information becomes available. The advantages of using this method are twofold. It helps reduce the uncertainty surrounding user preferences, and once a coverage set has been learned, decision-makers can adjust their selected solution to reflect their updated preferences [78]. Hence, stakeholders are encouraged to manage trade-offs between objectives. These capabilities improve decision-making in the context of sustainable maintenance in manufacturing, differentiating MCDM-DQN from traditional MCDM approaches. Our model has initially been tested in a simulated environment to validate its core mechanisms and effectiveness. Although it does not directly handle real-world uncertainty, disruptions, or varying maintenance strategies across industries, it provides a foundation for decision-making in controlled settings. Its adaptive learning and multi-objective optimization capabilities enable scenario analysis, providing insights that can be further refined through industry-specific adjustments.

6.1 Practical Implications

Industry 5.0 signifies a transformation in manufacturing, emphasizing sustainability, resilience, and commitment to human values. This progressive approach aims to integrate advanced automation with human experience. In this context, sustainable maintenance activities are crucial to ensure optimal performance and longevity of industrial machinery and equipment. They also contribute to reducing pollution and waste while improving the production capacity of the manufacturing sectors. Integrating these strategies into company operations is essential for achieving short-term success and fostering a more sustainable and inclusive future for everyone. To the authors’ knowledge, no systematic methodology exists in the literature that addresses the complex dynamics of sustainability in maintenance practices for Industry 5.0. Our research aims to fill this gap by providing a cutting-edge framework that integrates a decision-making process with AI to effectively address the problem of ranking maintenance factors based on their relative importance from the perspectives of various maintenance stakeholders.

This proposed strategy assists decision-makers in navigating the complexities of sustainability assessment within the rapidly evolving Industry 5.0 landscape. It enables individuals to interact with innovative technology in industrial environments, thereby enhancing their understanding of how maintenance factors influence manufacturing sustainability. In addition, it enables decision-makers to prioritize various maintenance factors according to their organization’s operational context. However, adopting a human-centric approach within an organization often requires significant changes to prioritize stakeholder needs. Furthermore, incorporating human-centered design into current product development structures may require reengineering processes, which also requires the acquisition of crucial skills and training for successful implementation. Although our MCDM-DQN model has only been tested in a simulated environment to validate its performance and effectiveness, the experiments conducted have shown promising results, indicating that the MCDM-DQN can match the performance of traditional MCDM methods while providing real-time feedback and enabling stakeholders to quickly manage trade-offs. This is crucial for empowering decision-makers in Industry 5.0 to prioritize sustainable practices in their choices. This contribution improves the practical importance of this research in sustainable maintenance.

7 Conclusions and Future Work

This research presents a MORL model called MCDM-DQN to assist decision-makers in evaluating key maintenance factors that impact manufacturing sustainability. Our MCDM-DQN demonstrated its ability to rank relevant factors that impact manufacturing sustainability across various industry segments, matching the performance of traditional MCDM techniques while providing the added advantage of real-time feedback, allowing decision-makers to adjust their selected solution to reflect updated preferences. To answer the research questions, this study proposes a decision support framework that combines human-centered design with MCDM-DQN. This framework places decision-makers at the heart of the decision-making process. By promoting collaboration among stakeholders and integrating their feedback with the MCDM-DQN, the framework improves prioritization according to the organization’s operational context. The proposed strategy guides decision-makers in navigating the complexities of sustainability evaluation in the fast-evolving landscape of Industry 5.0. It allows decision-makers to engage with new technologies in industrial environments, thereby deepening their understanding of how maintenance factors influence manufacturing sustainability.

However, despite the relevance and innovative nature of the proposed approach, it presents some limitations that should be considered in the continuation of this work. First, our model has initially been evaluated in a simulated environment to validate its core mechanisms and effectiveness. Therefore, its ability to address real-world uncertainty, disruptions, and varying maintenance strategies in different industries has yet to be tested. Second, since the proposed approach has not been evaluated in manufacturing settings, our model cannot guarantee scalability and real-time adaptation in large-scale production environments. Future studies should aim to address the identified issues to enhance the evaluation of the proposed framework. Additionally, these limitations present opportunities for investigating how sustainable maintenance practices impact manufacturing sustainability within the context of Industry 5.0 in future research. Furthermore, it is essential to further validate the proposed MCDM-DQN model through extensive industrial case studies to improve its robustness and generalizability. It is also essential to assess the solution’s capability to incorporate domain-specific constraints and industry feedback, ensuring its effectiveness in dynamic and complex manufacturing environments. Moreover, implementing additional features to bolster the model’s robustness should be considered. This could involve an interactive human-in-the-loop methodology that effectively identifies relevant policies early in the training process.

8 Nomenclature

 

\(D_m\):

Decision matrix

W:

Set of weights

AHP:

Analytic hierarchy process

AI:

Artificial intelligence

API:

Application programming interface

DNN:

Deep neural network

DQN:

Deep Q-network

DSS:

Decision support system

DSS:

Digital twins

DSS:

Multi-agent system

DST:

Deep-sea treasure

ELECTRE:

Elimination and choice translating reality

F-AHP:

Fuzzy analytic hierarchy process

HITL:

Human-in-the-loop

MC:

Mountain car

MCDM:

Multi-criteria decision-making

MCDM-DQN:

Multi-criteria decision-making with deep Q-networks

MDP:

Markov decision process

MOMDP:

Multi-objective Markov decision process

MORL:

Multiobjective reinforcement learning

PROMETHEE:

Preference ranking organization method for enrichment evaluation

RL:

Reinforcement learning

SW:

Software

TOPSIS:

Technique for order of preference by similarity to ideal solution