A Multi-objective Reinforcement Learning Model to Support Decision-Makers in Assessing Key Maintenance Factors for Sustainable Manufacturing

Almeida, José Carlos; Ribeiro, Bernardete; Cardoso, Alberto

doi:10.1007/s43069-025-00539-5

A Multi-objective Reinforcement Learning Model to Support Decision-Makers in Assessing Key Maintenance Factors for Sustainable Manufacturing

Research
Open access
Published: 23 August 2025

Volume 6, article number 124, (2025)
Cite this article

You have full access to this open access article

Download PDF

Save article

View saved research

Operations Research Forum Aims and scope Submit manuscript

A Multi-objective Reinforcement Learning Model to Support Decision-Makers in Assessing Key Maintenance Factors for Sustainable Manufacturing

Download PDF

José Carlos Almeida¹,
Bernardete Ribeiro¹^na1 &
Alberto Cardoso¹^na1

2121 Accesses
1 Citation
Explore all metrics

Abstract

Industry 5.0, the fifth industrial revolution, envisions collaboration between humans and machines, where human intelligence directs decision-making and machines handle empirical processing. This paper presents a decision support framework that combines human-centered design with a multi-objective reinforcement learning model (MORL), specifically multi-criteria decision-making with deep Q-networks (MCDM-DQN). This approach evaluates the importance of maintenance factors in achieving sustainability in manufacturing, emphasizing the perspectives of various stakeholders. By fostering collaboration between stakeholders and the MCDM-DQN, the framework effectively integrates their feedback, improving prioritization according to the operational context of the organization. The experiments confirmed the effectiveness of the method, demonstrating that MCDM-DQN efficiently ranks key factors while adhering to conventional methods and offering advanced features such as real-time feedback. These results assist decision-makers in selecting appropriate sustainable strategies and improve the synergy between advanced automation and human insight within the Industry 5.0 framework, providing valuable guidance to leaders and practitioners.

Deep Reinforcement Learning for Multiobjective Scheduling in Industry 5.0 Reconfigurable Manufacturing Systems

A smart framework to perform a criticality analysis in industrial maintenance using combined MCDM methods and process mining techniques

Article 17 February 2024

Implementation of Multi Criteria Decision Making (MCDM) Techniques in Selecting Influencing Factors for the Affecting of Artificial Intelligence (AI) in the Manufacturing

1 Introduction

Today, manufacturing companies are transitioning from the current age of smart manufacturing, known as Industry 4.0, to a new revolutionary wave of industry, Industry 5.0, where humans and machines are envisioned to work collaboratively [1]. The Fifth Industrial Revolution combines sustainable development goals with the digitalization aspects of Industry 4.0 through secure data transmission, bioinspired technologies, and human-centric solutions. Furthermore, Industry 5.0 envisions a creative, resilient, competitive, and socially centered industry while minimizing negative environmental and social impacts [2, 3]. The core aspects of Industry 5.0 include human centricity, sustainability, and resilience [4]. Sustainability and digitization are two significant trends that manufacturing companies must address as they impact their operations. Sustainable manufacturing aims to balance the three pillars of sustainability, environmental, social, and economic, while addressing the needs of stakeholders and achieving a competitive advantage [5, 6]. In addition, six key characteristics of sustainable production are highlighted in the literature: energy and resource efficiency, the natural environment, social justice, economic performance, worker rights, and product accountability [6]. To improve a maintenance system and support a sustainable manufacturing strategy, companies must identify key success factors that impact sustainability. Organizations that implement sustainable practices experience improved product and service quality, increased market share, and higher profits [7]. Therefore, integrating sustainability with maintenance promotes cost-effective maintenance practices. Sustainable maintenance activities significantly reduce pollution and waste while promoting employee safety. In addition, they improve the production capacity of manufacturing industries by ensuring the availability, reliability, and safety of equipment, which is vital for sustainable operations [8, 9].

A well-designed maintenance strategy has the potential to extend the lifespan of an asset and prevent unexpected failures, which can result in production losses, shipping delays, and decreased product quality. Such a strategy is essential for minimizing costs and enhancing productivity [10]. In addition, as attention to environmental problems increases, asset and product life cycle management becomes essential for sustainable manufacturing production, providing the functions necessary for society while minimizing resource consumption [11]. Despite Industry 4.0’s focus on digitization and automation, recent studies on the effects of digitalization suggest that merely constructing industrial plants with complex, non-interoperable technical systems is insufficient to improve their productivity and resilience in addressing sustainable development challenges. This has led to a focus on human-centric Industry 5.0. Humans will continue to be a vital resource for the competitiveness of manufacturers, particularly in activities that require flexibility, critical thinking, and originality [12]. In the highly automated and digitalized factories of the future, humans will perform more decision-making and problem-solving in increasingly complex socio-cyber-physical production systems, rather than performing many physical duties [4, 13]. In this context, multi-criteria decision-making (MCDM) techniques can be beneficial for involving multiple decision-makers with competing objectives to evaluate maintenance factors that influence manufacturing sustainability and reach a consensus on decision-making [14]. Furthermore, MCDM techniques offer a structured approach to tackle decision-making problems involving multiple objectives, criteria, and conflicting preferences [15].

This research introduces a decision support framework oriented around human-centered design, which integrates conventional MCDM methods with artificial intelligence (AI), specifically utilizing reinforcement learning (RL) algorithms. In an era where data availability and accessibility have led to remarkable advances in manufacturing processes, RL algorithms can process large datasets with minimal model assumptions, handle high-dimensional spaces, and consider long-term outcomes. Additionally, RL agents learn adaptively through real-time interactions with their environment. This capability improves decision-making in complex industrial situations, distinguishing them from traditional MCDM and other analytical methodologies [16]. Reinforcement learning is well-suited for sequential decision problems and expands machine learning to encompass modeling, prediction, and decision-making. The algorithm operates as a rational agent that learns to act in an uncertain environment through trial and error [17]. However, real-world decision-making processes are often complex and involve trade-offs among several, often conflicting, objectives. For multi-objective decision-making problems, authors recommend the use of a multi-objective reinforcement learning algorithm (MORL) [18, 19].

This text builds on previous research referenced in [20]. This research discusses a MORL model that integrates multiple deep Q-Networks (DQNs), a technique introduced by [21]. The proposed model, multi-criteria decision-making with deep Q-Networks (MCDM-DQN), aims to address challenges in multi-objective decision-making and leverage the capabilities of deep neural networks to tackle real-world MCDM issues. In addition, this study investigates the integration of the MCDM-DQN model into a human-centered framework, enabling company stakeholders to assess the impact of key maintenance factors on manufacturing sustainability. Although recent advances in Industry 5.0 concepts, such as sustainability, human-centeredness, and human-machine collaboration, have been significant, we have identified gaps in current research regarding the design of a comprehensive framework that incorporates these concepts and integrates AI and decision-making processes to evaluate sustainable maintenance factors and provide more valuable information for assessing manufacturing sustainability. Our study aimed to fill these gaps by addressing the following research questions: 1) How can an MORL algorithm help prioritize maintenance factors based on their sustainability impact? 2) How can stakeholders collaborate using a MORL model to tackle decision-making problems? To address these questions, this study proposes a decision support framework that integrates the best MCDM practices from the literature with the MCDM-DQN model to rank maintenance factors according to their relative importance from the perspectives of various maintenance stakeholders. The proposed approach aims to empower decision-makers in sustainable maintenance by offering a comprehensive framework that improves their understanding of the impacts of various maintenance factors. The study also aims to promote human-machine collaboration, incorporate human capabilities by considering various perspectives of maintenance stakeholders, and prioritize maintenance factors that impact sustainable manufacturing based on the organization’s operational context. In summary, the paper makes several important contributions. First, it presents a solution that integrates multiple DQNs to manage conflicting objectives and prioritize key maintenance factors to evaluate sustainability strategies in manufacturing. This approach addresses the challenges in multi-criteria decision-making without using traditional MCDM techniques. Second, a human-centered design is proposed, aligned with Industry 5.0 and emphasizing human participation in decision-making. This enables experts to input data reflecting their cognitive abilities into the MCDM-DQN algorithm and review the results. Lastly, case studies across various sustainability scenarios validate the approach’s effectiveness.

The remainder of the paper is structured as follows. Section 2 provides a review of the literature on sustainable maintenance factors, current MCDM approaches to assess sustainability, and innovative MORL algorithms. Section 3 outlines the problem statement and describes the methodology. Section 4 details the implementation of the MCDM-DQN model. Section 5 presents the experimental evaluation and discusses the results obtained. Section 6 covers the discussion and practical implications of the research. Finally, Sect. 7 concludes the paper and suggests directions for future work.

2 Related Work

This section focuses on evaluating previous work that is relevant to this investigation. First, we will address sustainable maintenance factors based on the existing literature. Next, we will discuss key state-of-the-art MCDM approaches used to assess sustainability, followed by a discussion on MORL algorithms. In this study, the literature review was conducted using Google Scholar and Scopus for our literature search. These databases allowed us to conduct a robust and comprehensive literature review. We formulated our primary research query as (“Sustainability” OR “Sustainable”) AND (“Maintenance” OR “Manufacturing”) AND (“Industry 4.0” OR “Industry 5.0”) because this study examines the contributions of maintenance to manufacturing sustainability, incorporating elements from both Industry 4.0 and Industry 5.0. In addition, complementary queries were developed to gain an understanding of existing research related to MCDM methods and cutting-edge MORL algorithms. For example: 1) (“multicriteria decision making”) AND (“sustainability”), 2) (“multi-criteria decision-making”) AND (“advances”), and 3) (“multi-objective reinforcement learning”) OR (“deep reinforcement learning”).

2.1 Sustainable Maintenance Factors

Maintenance systematically monitors, repairs, and replaces equipment to ensure its desired functionality [22]. Maintenance also refers to the actions taken to keep a system operational and functional throughout its lifecycle or to restore it to a state where it can perform its intended function [23]. The primary goal of maintenance is to eliminate activities that result from equipment failure or human error. These maintenance efforts will impact overall production processes, affecting multiple elements that are essential for ongoing performance enhancement, such as product quality and delivery precision [24]. Efficient maintenance improves value by optimizing resource utilization, improving product quality, and minimizing rework and waste. Companies must identify factors that impact sustainability performance based on their unique processes, business demands, and goals to address sustainability challenges effectively [6]. Manufacturing industries aiming at sustainability face the challenge of achieving sustainable maintenance. This requires balancing economic, environmental, and social factors by managing the financial costs associated with repairs and supplies, while also considering greenhouse gas emissions, energy consumption, and the health and safety of workers [7]. Historically, financial indicators were used to describe business performance in manufacturing enterprises. Companies now take a comprehensive approach to sustainability factors, including societal variables, to improve economic and non-monetary outcomes [24]. Manufacturing enterprises must evaluate their economic, social, and environmental effects to achieve sustainable development goals. They should provide a return on investment and minimize their environmental impact [25, 26]. In [27], the authors conducted a study addressing the role of maintenance in promoting industrial sustainability, considering the perspectives of different stakeholders. In addition, they proposed a conceptual framework to help various stakeholders assess the effect of maintenance on the three dimensions of sustainability.

In [28], the authors introduce a conceptual framework designed to assess how maintenance affects sustainability, providing an overview of the influence that maintenance tasks have on all sustainability dimensions. This framework helps decision-makers across various company departments recognize these impacts. In [6], a literature review and participation with industry professionals identified ten maintenance factors to tackle the challenges of sustainable manufacturing from a tactical perspective. After further analysis to determine the relationships between these factors, group them into clusters and apply MCDM techniques to rank the most critical factors. The authors found that technical factors, such as the implementation of preventive and prognostic service methods, the use of maintenance and operation data collection and processing systems, and the modernization of machines and devices, are key to addressing sustainable manufacturing. The authors of [26] evaluated the performance of sustainable development in three dimensions: business, environment, and social. The following key factors were examined for each dimension: return on investment, raw material consumption, and turnover ratio. They concluded that enhancing an enterprise’s sustainable development performance requires a blend of factors to achieve optimal results. The authors of [8] examined the factors that influence sustainable maintenance practices in manufacturing companies. Their study ranked these factors according to their interdependence. The key factors identified include availability rate, government regulations, and the importance of training and education. The study also reveals that energy consumption is affected by all the other factors considered. Conversely, changes in certain factors can impact dependent factors, such as product quality and technology.

In Industry 4.0, industrial systems can monitor processes and make intelligent decisions through real-time connections with people, machines, sensors, and other factors. The transition to Industry 5.0 affects all levels of the organization, including maintenance [29]. According to [4], future research in the field of Industry 4.0/5.0 maintenance and sustainability should investigate sustainable maintenance while considering all three pillars of sustainability simultaneously. To ensure sustainable maintenance procedures and systems, new frameworks, techniques, and performance indicators must be developed, considering these interdependencies.

2.2 State-of-the-Art MCDM Approaches for Assessing Sustainability

The process of making decisions is complex, taking into account various factors to reach a desired result. Multi-criteria decision-making methods provide systematic approaches to tackle decisions that involve multiple criteria (or objectives) and diverse preferences. These approaches facilitate the formulation of optimal solutions that are consistent with the preferences of decision-makers [30]. There are many classical MCDM methods available, such as the Analytical Hierarchy Process (AHP) [31], fuzzy AHP (F-AHP) [32], Technique for Order of Preference by Similarity to Ideal Solutions (TOPSIS) [33], Elimination and Choice Translating Reality (ELECTRE) [34], Preference Ranking Organizing Method for Enrichment Evaluation (PROMETHEE) [35]. These methods help organize decision-making by prioritizing criteria, ranking alternatives, and forming preference relationships. According to [15], the importance of MCDM methods is derived from their ability to address the complexities inherent in decision-making processes with various objectives, criteria, and stakeholders. MCDM approaches help decision-makers to solve challenges in multiple domains, such as business and management, engineering and technology, environmental decision-making, and sustainability assessments [4]. The authors of [36] proposed two approaches based on the PROMETHEE and TOPSIS techniques to prioritize sustainability strategies in the maintenance planning of a cement plant. The study revealed that optimizing maintenance policies and consumables represents the most effective sustainable maintenance solutions. In [37], the authors introduced a hierarchical method to assess sustainability in uncertain environments. This approach employs fuzzy logic to manage uncertainty, allowing decision-makers to articulate their subjective preferences using linguistic variables. The authors of [38] introduced a multi-criteria approach to evaluate wine production through the lens of the triple bottom line. The AHP method was used to help producers understand the significance of each dimension. They noted that the environmental dimension was the most crucial factor. The authors of [39] employed PROMETHEE in an environmental impact assessment to evaluate and rank alternative actions based on sustainability criteria. According to [15], the AHP and TOPSIS MCDM methods prioritize sustainability goals and enhance decision-making processes. The AHP and TOPSIS methods are applied to evaluate suppliers based on quality, pricing, delivery, and sustainability. These approaches have also been used to help balance cost, service levels, and sustainability objectives [40, 41].

A significant advancement in MCDM is the adoption of multi-objective approaches. Unlike traditional methods that typically focus on a single objective or combine multiple criteria, multi-objective MCDM addresses conflicting goals. This allows decision-makers to identify Pareto-optimal solutions that effectively balance various criteria [15]. Another major development is the integration of fuzzy logic techniques in MCDM. Fuzzy-based MCDM approaches effectively manage ambiguity and uncertainty, offering decision-makers realistic support through fuzzy criteria and linguistic evaluations [42, 43]. Furthermore, the combination of MCDM with data-driven techniques has been explored. Integrating machine learning algorithms with MCDM methods improves prediction accuracy and decision support. Examples of this integration include the use of neural networks, decision trees, and support vector machines [15]. Moreover, a notable advancement in the industry is the hybrid MCDM method. These hybrid techniques integrate multiple decision-making approaches to address problems, capitalizing on their advantages and reducing limitations. For example, the authors in [44] proposed a hybrid approach that merges the TOPSIS method with decision trees to identify suppliers in supply chain management. In [45], two hybrid MCDM systems integrate additive ratio assessment with TOPSIS and complex proportional assessment to solve a real-time robot selection problem involving 12 alternative robots and five selection criteria. The authors of [46] proposed a hybrid fuzzy MCDM method to address supplier selection issues that are sustainable and resilient.

Recent developments in MCDM techniques include the integration of AI and ML into decision-making processes, the development of dynamic decision support systems (DSSs), and the enhancement of DSSs through multi-agent digital twins. Employing AI and ML strategies, such as neural networks, evolutionary algorithms, and reinforcement learning, can improve the effectiveness and precision of DSS in MCDM. [15]. According to [47], a conventional DSS can be improved with AI by managing large amounts of data and various constraints within the decision-making environment. This strategy for developing DSSs can increase accuracy, learning, and prediction. A dynamic decision support system for sustainable supplier selection in circular supply chains is proposed by [48]. The proposed approach integrates machine learning and fuzzy inference algorithms, enabling customers to customize and prioritize their criteria before choosing the best supplier. The authors of [49] proposed a dynamic DSS to evaluate cloud storage providers based on their offered services and select a suitable one. Lastly, multi-agent systems (MAS) are a branch of distributed AI in which autonomous agents collaborate to achieve a common goal. MAS is essential for developing Digital Twins (DTs). It enables the parallel and distributed computation required to process large amounts of data and complex simulations typically associated with DTs [50]. The authors of [51] proposed a hierarchical decision-making framework that uses DT technology to optimize resource usage and facilitate real-time mission planning for multiple UAVs, employing DT-enabled reinforcement learning to leverage real-world experiences. A digitally enabled DSS that supports intelligent and tailored supplier management in dynamic and challenging environments is presented by [52]. The system employs fuzzy stratified decision-making to assess the potential impact of future events. Additionally, it utilizes multi-agent digital twin technology to validate the effectiveness of supplier development plans.

2.3 Cuting-Edge MORL Algorithms

Reinforcement learning is an effective paradigm for sequential decision-making tasks, where the decision-maker observes a process before making a final decision. Unlike traditional control methods, RL makes minimal assumptions about the problem, allowing easy adaptation to any task assignment [53]. Traditional RL methods that excel at tasks such as playing video games focus on single-objective techniques, such as deep Q-learning [21, 54]. In contrast, real-life problems usually involve simultaneously meeting multiple, often conflicting objectives [55]. Therefore, optimal policies should arise from these methods [56]. However, the number of optimal policies may vary for multi-objective problems, depending on the trade-offs involved in achieving different objectives [57]. Therefore, MORL methods seek to create a policy coverage set to address all possible user preferences in solving the problem [53]. In [58], DQN algorithms were modified to facilitate single-policy linear MORL by creating an approximate coverage set of policies. Each policy was represented with a neural network that uses an outer-loop strategy. The algorithms’ effectiveness was evaluated using two MORL benchmark scenarios: Mountain Car (MC) and Deep Sea Treasure (DST). The author of [57] introduced a method for utilizing DQNs in multi-objective environments where various DQNs guide the agent’s behavior toward specific goals. The proposed method used decision values to enhance the scalarization of multiple DQNs into a single action. It was tested on a game-like simulator in which an agent with visual input pursues numerous goals. In [59], an algorithm was proposed to learn a single-policy network optimized across the entire preferences space within a domain, generating the optimal policy for any user-specific preference. This showcases the effectiveness of deep neural networks in scaling MORL with linear preferences. The authors’ approach was evaluated using popular MORL benchmarks, such as DST and the video game Super Mario Bros. The authors of [60] proposed a DQN and extended it to multi-objective scenarios, maximizing drug-likeness while maintaining molecular similarity. A multi-objective DQN method was used for autonomous driving, which requires the consideration of various factors simultaneously, including obeying traffic rules, avoiding collisions, reaching the destination as quickly as possible, and ensuring passenger safety [61].

The authors of [62] reported that a Pareto set of various policy families effectively represents the optimal performance trade-offs for multi-objective robot control. A prediction-guided evolutionary learning algorithm was used to identify high-quality policies within the Pareto set to calculate these representations. A MORL environment with a continuous action space was created to validate the effectiveness of the algorithm. The authors of [63] presented a DQN-based MORL framework that supports single-policy and multi-policy techniques and linear and non-linear action selection approaches. The proposed method was validated in two benchmark environments: a two-objective DST problem and a three-objective MC problem. The experimental results showed that the framework could find Pareto-optimal solutions effectively. The study presented in [64] unveiled the Q-Managed algorithm, a MORL approach proficient in acquiring non-dominated multi-objective policies even when faced with deterministic transition functions, regardless of the Pareto front’s configuration. In [65], the authors addressed the difficulties in scheduling energy-efficient automated guided vehicles that require battery changes within a production logistics framework. They introduce an innovative data-driven method that employs a deep reinforcement learning-based agent to orchestrate AGV tasks and devise battery replacement strategies in response to real-time service demands. The authors of [18] propose an intelligent energy management strategy for multi-microgrid power systems that utilizes multi-objective reinforcement learning methods based on preferences to create a Pareto-optimal set for every objective, thus maximizing its advantages. A non-dominated solution involves crafting a plan that maintains balance without giving undue advantage to any stakeholder. The method successfully captures diverse preferences, as demonstrated by preference-based outcomes. The authors of [66] proposed a MORL approach for energy management strategies. This approach aims to balance hydrogen consumption in fuel cell vehicles, the fuel cell’s durability, and the battery’s lifespan, ultimately reducing the overall lifecycle costs associated with fuel cells. A novel technique for optimizing pressurized water reactors, the Pareto envelope augmented with reinforcement learning, was introduced by [67] to tackle the complexities of multi-objective problems, especially in engineering, where evaluating possible solutions can be time-consuming.

2.4 Novelty and Research Gap

Although the diligent work and profound insights of the authors have significantly advanced our understanding of the importance of MCDM methods in various fields, specialized methodologies are needed that can effectively address the challenges and demands of Industry 5.0. Planning for the future requires identifying directions and developing trends in MCDM research. For example, integrating MCDM with emerging technologies such as AI, cyber-physical systems, and the Internet of Things has significant potential to improve decision-making processes [15, 68]. The literature outlines various constraints and limitations of current MCDM approaches, as detailed in Table 1. This study aims to address existing challenges and gaps by proposing a decision support framework that integrates AI and MCDM techniques to rank and prioritize sustainable maintenance factors to help stakeholders evaluate manufacturing sustainability. The proposed approach can also analyze large datasets and facilitate real-time decision-making capabilities. Additionally, this study enhances the decision-making abilities of the suggested MCDN-DQN model by promoting human collaboration through data input that reflects diverse perspectives of maintenance stakeholders while reviewing and verifying the model’s output. Furthermore, this research paper advocates for a human-centered approach that aligns with the vision of Industry 5.0, promoting stakeholder engagement and decision transparency. It emphasizes the active participation of stakeholders in the decision-making process and seeks to incorporate their preferences without relying on traditional MCDM methods.

Table 1 Limitations and constraints of current MCDM techniques

Full size table

Table 2 MCDM decision matrix

Full size table

3 Problem Statement and Methodology

3.1 MCDM Problem

The MCDM problem is a process for selecting the best option from a set of alternatives [30]. Mathematically, it is defined as follows:

$$\begin{aligned} A = \{A_i \mid i = 1, 2, \dots , m\} \end{aligned}$$

(1)

where A represents a unique and finite collection of m alternatives.

$$\begin{aligned} C = \{C_j \mid j = 1, 2, \dots , n\} \end{aligned}$$

(2)

where C denotes a set of n criteria used to evaluate A, the criteria may have diverse, unrelated units and present varied, potentially conflicting goals. Minimization may be preferred in some cases, while maximization is favored in others.

$$\begin{aligned} W = \{w_j \mid j = 1, 2, \dots , n\} \end{aligned}$$

(3)

where W denotes a set of weights, each normalized and allocated to a criterion based on its importance. Typically, the data collected in Eqs. 1 to 3 is structured into a matrix known as the decision matrix $D_m$, which includes the alternative ratings concerning each criterion, as shown in Table 2. In the decision matrix, each element $x_{ij}$ represents the judgment of $A_i$ with respect to Cj. The matrix $D_m$ and the weight set W are the basic inputs for the MCDM methods, which score the alternatives and rank them according to their priority.

3.2 MORL Problem

The Markov decision process (MDP) is a framework for sequential decision-making in various scenarios. In MDPs, agents act as both learners and decision-makers by interacting with their environment. The agent chooses actions that lead to different outcomes while receiving rewards, which are numerical values that the agent seeks to maximize over time [16]. A MORL model is structured as a multi-objective Markov decision process (MOMDP) [19, 78]. In a multi-objective problem consisting of $n > 2$ objectives, the MOMDP is described as a tuple $\langle S, A, T, R, \gamma \rangle $, where S is the state space, A is the action space, $T: S \times A \times S \rightarrow [0,1]$ is a probabilistic transition function, $R: S \times A \times S' \rightarrow \mathbb {R}^n$ is the vector of reward functions corresponding to the considered n objectives, and $\gamma \in [0,1]$ is the discount factor that influences future rewards. The key distinction between MOMDPs and standard MDPs lies in the reward signal. In MOMDPs, the reward function does not provide a single scalar value. Instead, it returns an n-dimensional vector that represents a numeric reward signal at each time-step t for each of the objectives n, as demonstrated in Eq. 4.

$$\begin{aligned} \vec {R_t} (s, a, s') = (R_{1,t}(s, a, s'), R_{2,t}(s, a, s'), \dots , R_{n,t}(s, a, s')) \end{aligned}$$

(4)

To address a reinforcement learning problem, it is essential to identify a policy that consistently generates significant rewards over time [16]. In multi-objective contexts, the definitions of reinforcement learning change when using a vectorial reward function. The agent’s goal is to take actions that ensure that its expected vector return $\vec {R}_t$, achieves Pareto optimality [18, 63, 79]. The expected vector return is defined as follows:

$$\begin{aligned} \vec {R}_{t} = \sum _{k=0}^{T - t} \gamma ^k \vec {R}_{t+k+1} \end{aligned}$$

(5)

where $T < \infty $ for episodic tasks, while for continuous tasks, $T = \infty $.

In MDPs, value functions estimate the expected cumulative future rewards that an agent can achieve from a specific state while following a policy $\pi $. The state value function for MOMDP $\vec {V}^\pi (s)$ is also vector-valued, $\vec {V}^\pi \in \mathbb {R}^n$. This function specifies how good a certain state s is in the long term according to policy $\pi \in \Pi $ as follows:

$$\begin{aligned} \vec {V}^\pi (s) = \mathbb {E}_\pi \{\vec {R}_t \mid S_t=s\} = \mathbb {E}_\pi \{\sum _{k=0}^{T-t} \gamma ^k \vec {R}_{t+k+1} \mid S_t=s\} \end{aligned}$$

(6)

where $T < \infty $ for episodic tasks, while for continuous tasks, $T = \infty $.

The state action value function $\vec {Q}^\pi (s,a)$ that specifies the expected rewards starting from state s, taking arbitrary action a, and then following policy $\pi $ is also vector-valued, as follows:

$$\begin{aligned} \vec {Q}^\pi (s,a) = \mathbb {E}_\pi \{\vec {R}_t \mid S_t=s, A_t = a\} = \mathbb {E}_\pi \{\sum _{k=0}^{T-t} \gamma ^k \vec {R}_{t+k+1} \mid S_t=s, A_t=a\} \end{aligned}$$

(7)

where $T < \infty $ for episodic tasks, while for continuous tasks, $T = \infty $.

In MDPs, $V^\pi (s)$ provides comprehensive ordering throughout the policy space. For any $\pi $ and $\pi \prime $, either $V^\pi (s) \ge V^{\pi \prime } (s)$ or $V^\pi (s) < V^{\pi \prime } (s)$. Therefore, the optimal policy $\pi ^*$ is equivalent to maximizing the expected discounted reward, which is not always true for MOMDPs. When multi-objective problems are approached using a single-policy method, a utility function $u: \mathbb {R}^m \rightarrow \mathbb {R}$ is used to map the vector-valued state value function to a scalar value $V^\pi _u = u(V^\pi )$, the result would be a total ordering of policies, thus reducing the MOMDP to a single-objective decision-making problem, which is not always desirable [78]. This study employs a multi-policy approach to tackle MOMDP problems. The aim is to identify a set of optimal policies, denoted as $\pi ^*= \{\pi _1,\pi _2, \dots , \pi _n \}$, commonly referred to as the Pareto front. This set of policies aims to satisfy all user preferences related to the defined objectives. For those interested, the work of [78] provides a comprehensive discussion of the multi-objective sequential decision problem.

3.3 Methodology

This section introduces the decision support framework depicted in Fig. 1. The proposed framework integrates MCDM techniques with the MCDM-DQN model to address the challenge of ranking maintenance factors according to their relative importance from the perspectives of various decision-makers. The suggested approach is aligned with the concept of Industry 5.0. The objective is to help stakeholders understand how key maintenance factors influence manufacturing sustainability by incorporating three modules: sustainable maintenance support, MCDM-DQN model, and decision-making support. It also covers seven stages, as described in Table 3. These stages adhere to the AI development lifecycle defined in [80]. It is essential to note that while the solution presented in Fig. 1 provides a comprehensive framework to support sustainable maintenance decision-making, the discussion and case studies primarily focus on the development of the MCDM-DQN model, which will be explored in subsequent sections.

Table 3 The seven stages included in the proposed framework

Full size table

The role of maintenance in achieving sustainability goals in manufacturing varies depending on a company’s operational and business context. To address this issue, the sustainable maintenance support module encompasses the three initial stages outlined in Table 3, allowing stakeholders to effectively assess the impact of maintenance factors on manufacturing sustainability. In the first stage, decision-makers identify key factors that are significant to their businesses. The second stage establishes the assessment criteria C and their respective weights W while selecting the essential factors. In the third stage, decision-makers evaluate the key factors associated with each criterion and create a decision matrix $D_m$. Finally, $D_m$ and W serve as fundamental inputs for the MCDM-DQN. In the evaluation stage, maintenance stakeholders share their perceptions to assess the importance of various maintenance factors. According to [6], specialists from various departments (including production, quality, maintenance, safety, health, and environment) should collaborate in teams to evaluate maintenance factors. This approach helps minimize bias among decision-makers and identify discrepancies in how different individuals perceive maintenance factors based on established criteria. It is important to note that this research does not advocate any specific method of execution. However, several studies referenced in this research apply fuzzy logic in the evaluation process. Specifically, a fuzzy entropy-weighted approach is used to compute weights, while triangular fuzzy numbers are used to evaluate various sub-criteria related to maintenance sustainability strategies [36]. In addition, linguistic variables and fuzzy numbers are used to form the decision matrix and evaluate the significance of the criteria [6, 36, 68]. Fuzzy-based multi-criteria decision-making techniques aid in managing ambiguity and uncertainty in decision-making processes. They enable decision-makers to express their subjective preferences using linguistic variables. This approach fosters effective collaboration among multiple decision-makers, enabling them to reach a consensus and evaluate alternative options. [37, 42, 43].

The MCDM-DQN module implements the MCDM-DQN algorithm, which is the core of the proposed solution. It contributes to the purpose of the overall framework by ranking key sustainable maintenance factors, enabling stakeholders to select the optimal factor from a prioritized set of relevant factors. Developing the model requires an intensive, iterative, and exploratory approach. Due to inherent uncertainty in the timing and quality of outcomes in AI development, frequent loops, ongoing modifications, and improvements to initial requirements are essential. In this research, the performance of the MCDM-DQN model is compared to that of other traditional MCDM techniques through three experiments involving the ranking of sustainability factors in different business contexts.

To enhance the evaluation and deployment of the model, the framework should be integrated with production systems to facilitate the testing of MCDM-DQN using unseen data. Stakeholders should assess whether the model’s quality, stability, and performance meet expectations and determine its deployment. It is essential to emphasize that stakeholder participation in decision-making is significant because their influence, experience, and insights into the company can help decision-makers make informed decisions to achieve sustainable development goals. Understanding stakeholders’ interests and needs is essential for optimizing decision-making strategies.

4 The MCDM-DQN Model

The MCDM-DQN model depicted in Fig. 2 combines multiple DQNs to implement a MORL solution in which the agent pursues multiple objectives. Let $O = \{o_1, o_2, \cdots , o_n\}$ be the set of objectives of an agent. Instead of a single reward, the agent receives a vector of rewards $\vec {r_t} =[r_{1,t}, r_{2,t}, \dots , r_{n,t}]$ at each time-step t where $r_{i,t}$ corresponds to objective $o_i$. For each objective $o_i$ and time-step t, the discounted return is defined by Eq. 8.

$$\begin{aligned} R_{i,t} = \sum _{k=0}^{\infty } \gamma ^k r_{i,t+k} \end{aligned}$$

(8)

Additionally, for each objective $o_i$ there is a Q-function $Q_i (s,a)$ that produces the expected discounted return $R_{i,t}$, such that $Q_i (s,a) = \mathbb {E}[R_{i,t}|S_t = s,A_t = a]$. Then, a vector of Q-functions, which includes Q(s, a) for each objective $o_i$ can be defined:

$$\begin{aligned} \vec {Q} (s,a) = [Q_1(s,a), Q_2(s,a), \dots , Q_n(s,a)] \end{aligned}$$

(9)

The agent can determine the optimal action related to the objectives $o_i \in O$, in time-step t, for a given state $s_t$, via the function $Q_i(s, a)$ as follows:

$$\begin{aligned} a_{i,t} = \underset{a}{\text {argmax}} Q_i(s_t, a) \end{aligned}$$

(10)

The vector $\vec {a_t} = [a_{1,t}, a_{2,t}, \dots , a_{n,t}]$, which consists of optimal actions relating to objectives in a given time-step t, can also be determined. However, it is necessary to convert $\vec {a_t}$ into a single action, as the agent may perform only one action at each time step. Therefore, a linear scalarization function that combines $\vec {Q} (s,a)$ into a single action given a weight vector $\vec {W} = [w_1, w_2, \dots , w_n]$ is applied as follows:

$$\begin{aligned} SQ(s,a) = \sum _{i=1}^{n} w_i Q_i(s,a) \end{aligned}$$

(11)

where $w_i$ is the weight assigned to each specific objective $o_i$, and n represents the total number of objectives. Therefore, SQ(s, a) can be applied to Eq. 10 to select an action.

The MCDM-DQN agent uses a different DQN as an approximator for each $Q_i (s, a) \in \vec {Q}(s, a)$. Each DQN provides a vector of q-values combined to select a single action that the agent performs. The q-values obtained from the DQNs for different objectives are merged using the weights $w_i$ defined by the manufacturing experts. Consider a vector $\vec {q_i}$ that consists of q-values provided by $Q_i(s, a)$ for each possible action $a_i \in A$ and a single-objective $o_i \in O$ defined as follows:

$$\begin{aligned} \vec {q_i} = [Q_i(s,a_1), Q_i(s,a_2), \dots , Q_i(s,a_m)]. \end{aligned}$$

(12)

In this approach, scalarization can sum all q-values and select the action corresponding to the maximal value in the scaled q-value vector. Since the q-values may be real numbers, each vector must be rescaled to $[0,1] \subseteq \mathbb {R}$ to produce meaningful results. In our experiments, the scikit-learn library function preprocessing.MinMaxScaler() was used to rescale the values in $\vec {q_i}$ as follows:

$$\begin{aligned} scale(x) = \frac{x - x_{\min }}{x_{\max } - x_{\min }} \end{aligned}$$

(13)

The scalarized Q-vector is defined by Eq. 14, where n is the number of objectives and $w_i$ corresponds to each objective weight. Finally, we can sum the vectors via the rescaled $\vec {q_i}$ and select an action corresponding to the highest total q-value.

$$\begin{aligned} \vec {q_s} = \sum _{i=1}^{m} w_i scale(\vec {q_i}) \end{aligned}$$

(14)

4.1 MCDM-DQN Learning

Deep Q-network algorithms utilize neural networks parameterized by $\theta $ to approximate the function $Q(s, a,\theta )$, where $\theta $ refers to the learnable parameters of the neural network. In our MCDM-DQN model, multiple DQNs are used. Therefore, there is a function $Q_i(s, a,\theta _i)$ for a $DQN_i$ with respect to the objective $o_i$. Each $DQN_i$ is optimized by minimizing a loss function in each iteration j, with the target $y^{DQN}_j = r + \gamma \underset{a'}{\max }\ Q(s', a'; \theta ^-_j)$ as follows:

$$\begin{aligned} \mathcal {L}_j(\theta _j) = \underset{s, a, r, s'}{\mathbb {E}} \bigr [(y^{DQN}_j - Q(s,a;\theta _j))^2 \bigr ] \end{aligned}$$

(15)

It is important to understand that each DQN involves two identical multi-layer feed-forward networks during the learning process. The first is the online network $Q_i(s, a,\theta )$, which is updated at each iteration j. The second is the target network $Q_i(s, a; \theta ^-$, which is updated only at k iterations [81]. Each DQN is implemented as a subclass of PyTorch’s neural networks, specifically “torch.nn.Module” [82]. The neural network architecture includes two layers with ReLU activation followed by an output layer, using default parameters, with each layer set to 128 units.

Furthermore, experience replay [83] improves the learning process by using training experiences more effectively. Replay memory retains experience samples for re-utilization during training. Each agent logs the states, actions, and rewards encountered within the replay memory $M_i$ specific to each $DQN_i$. Training involves sampling past experiences uniformly at random from this replay memory. The selected samples serve as mini-batches for the gradient descent optimization process. Each $DQN_i$ is optimized iteratively using a loss function for each iteration j as follows:

$$\begin{aligned} \mathcal {L}_{i,j}^Q = \underset{s, a, r, s'}{\mathbb {E}} \sim U(M_i) \bigr [(r_i + \gamma \underset{a'}{\max }\ Q_i(s',a'; \theta _{i,j}^-) - Q_i(s,a;\theta _{i,j}))^2 \bigr ] \end{aligned}$$

(16)

5 Experimental Evaluation

The following sections describe the implementation of the MCDM-DQN environment and the experiments carried out to evaluate the performance of the model.

5.1 The MCDM-DQN Environment

To evaluate the performance of MCDM-DQN, we created a virtual environment named SustainableIndicators. This environment simulates how the key maintenance factors chosen by stakeholders affect the behavior of various sustainability indicators. These indicators align with common sustainability objectives in manufacturing, including environmental, social, health and safety, and economic. The environment consists of a continuous four-dimensional state within the range of [0, 100]. A stochastic oscillation is provided and used to model seemingly random changes in manufacturing sustainability. The initial state $s_0$ is randomly chosen, determining the starting level $L_i$ for each objective $o_i$ within the interval [0, 100]. At each time step, $L_i$ is modified by $L_{i, step}$, increasing if associated with a benefit criterion and decreasing if linked to a cost criterion. The term $L_{i, step}$ is calculated as follows: for a cost criterion $L_{i, step} = (100 - L_i) \times v$, and for a benefit criterion, $L_{i, step} = L_i \times v$, where v is a random variable within the range [0.0001, 0.001]. Consequently, these conflicting objectives continuously evolve at each time step. Furthermore, the MCDM-DQN agent performs various actions related to the alternatives presented in the decision matrix $D_m$ that decision-makers input into the MCDM-DQN model. The goal is to minimize objectives related to cost criteria while maximizing those related to benefit criteria. When an action is executed, the algorithm searches for the corresponding alternative ratings $a_{I, j}$ for each objective $o_i$ and adjusts $L_i$ by adding or subtracting a value corresponding to $a_{i, j} \times 0.1$. Our environment utilizes the MO-Gymnasium API, an open-source Python library, to develop and compare multi-objective reinforcement learning algorithms [84]. It also conforms to the standard Gymnasium API [85] but returns vectorized rewards instead of scalars.

The MCDM-DQN agent is designed to perform episodic tasks. At each time interval, it interacts with the environment by observing the current state and selecting an action to execute using a $\epsilon $-greedy policy. The MCDM-DQN agent dynamically adjusts strategies by balancing exploitation and exploration to maximize rewards, as shown in Algorithm 1. After acting, the agent receives a vectorized reward signal and transitions to a new state. The reward values for the objectives related to the benefit criteria are established at $-1$ for lower performance levels in the range [0, 30], 0 for intermediate performance levels in the range (30, 70), and 1 for higher performance levels in the range [70, 100]. The same reward values and performance level ranges apply to the objectives related to cost criteria, but in reverse order: higher performance levels in the range [0, 30] and lower performance levels in the range [70, 100]. Each episode ends when one of two final states is reached: 1) a balanced score of objectives shows cost criteria within the range of [0, 10] and benefit criteria in the range of [90, 100], or 2) a balanced score indicates cost criteria between [90, 100] and benefit criteria within [0, 10]. These scenarios yield a reward signal of 100 and -100, respectively. Furthermore, various hyperparameters were employed to train DQNs throughout the learning process, as detailed in Table 4.

Table 4 MCDM-DQN learning hyperparameters

Full size table

5.2 Experiments

This section describes various experiments conducted to assess the performance of the MCDM-DQN model. The SustainableIndicators environment was modified to optimize the state and action spaces for each specific MCDM problem. In each experiment, the MCDM-DQN model was trained according to the specifications described in Sect. 4.1. All DQNs used for the various objectives were trained independently, and the criteria weights were not utilized for scalarization during training. Four distinct scenarios were executed, each with different sets of criteria weights. In the base scenario, input weights W were utilized. Subsequent scenarios used different preference sets, which were randomly assigned, to evaluate the sensitivity of the rankings to variations in the criteria weights. A total of 1000 episodes were played for each scenario. The accumulated rewards earned for each action taken were recorded in each scenario. The method used to assess the performance of MCDM-DQN is described in Algorithm 1. The authors developed the necessary software to implement the solutions and perform the experiments presented in this study using multiple well-known and updated tools available for deep learning and RL implementations, such as Python 3.10.14, PyTorch, and OpenAI Gym environments running on Windows 11 for x64-based systems. The system used was an x64-based PC with an 11th-generation Intel Core i7 processor operating at 3.30 GHz, featuring 16 GB of RAM and 1 TB of storage.

Table 5 Sustainable software engineering practices

Full size table

5.2.1 Experiment 1: Assessing the Sustainability of a Software Development Company

In this experiment, we compared the performance of the MCDM-DQN with a benchmark established by [68]. His study employed fuzzy TOPSIS and AHP methods to assess sustainability practices in a software (SW) development company operating in Industry 5.0. The goal was to identify the most sustainable SW engineering practices while addressing various sustainability concerns: social, economic, and environmental. Four assessment criteria were established: $C_1$ for environmental impact, $C_2$ for social responsibility, $C_3$ for resource efficiency, and $C_4$ for economic viability. Next, five alternative SW engineering practices were identified (Table 5).

The experiment was carried out as follows. First, the decision matrix and the weights of the four criteria presented in [68] were defuzzified and served as input to the MCDM-DQN model. The training of each DQN was consistently stable over time, as shown in Fig. 3. To indicate computational effort, the average training execution time for each DQN was recorded as follows: a) DQN 1: 1.28 min; b) DQN 2: 1.47 min; c) DQN 3: 1.24 min; and d) DQN 4: 1.26 min. The trained MCDM-DQN model was finally used to rank sustainable SW practices.

The experimental results are illustrated in Figs. 4 and 5. Figure 4 shows the ranking order of the alternatives based on the accumulated rewards, with $A_1$ and $A_3$ performing the best in the base scenario, followed by $A_2$, $A_5$, and $A_4$. The dominant sustainability practices $A_1$ and $A_3$ remained in their positions in all scenarios, suggesting that they should be considered equally important with such differences in the criteria weights. The alternatives $A_2$, $A_5$, and $A_4$ changed their ranking in different scenarios, indicating that they are more sensitive to fluctuating preferences. The rewards accumulated for each action in the base scenario are presented in Fig. 5. The total accumulated rewards determine the ranking of each alternative. The results obtained in this experiment are consistent with the findings in [68]. A comparison analysis of the ranking of SW engineering practices obtained in this experiment and the findings of [68] is presented in Table 6. As we can see, the MCDM-DQN performance could match that of fuzzy TOPSIS. Specifically, the findings in [68] show that the performance differences between the TOPSIS and AHP methods suggest that alternative scores may vary depending on the MCDM method applied (Fig. 6).

Table 6 A comparative analysis of the ranking of sustainable SW engineering practices obtained in experiment 1 and the findings of [68]

Full size table

5.2.2 Experiment 2: Evaluation of Maintenance Factors that Affect Sustainable Manufacturing

In this experiment, we compared the performance of the MCDM-DQN with a benchmark established by [6], which presents an empirical examination of the impact of the maintenance function on manufacturing sustainability in 58 companies. The most relevant maintenance factors were determined (Table 7). Then, F-AHP and TOPSIS methods were employed to evaluate the maintenance factors that influence sustainable manufacturing. The F-AHP method facilitated the participation of multiple decision-makers with different and conflicting criteria, promoting consensus and establishing the weights of these criteria [6]. Four assessment criteria were established: $C_1$ for manufacturing cost, $C_2$ for energy consumption, $C_3$ for waste reduction, and $C_4$ for operational safety.

Table 7 Key factors that influence sustainable maintenance

Full size table

The experimental procedure was conducted as follows. First, the fuzzy decision matrix and the fuzzy weights for five factors, as detailed in [6], were defuzzified to serve as input for the MCDM-DQN model. The training of each DQN was consistently stable over time, as shown in Fig. 4. To indicate computational effort, the average training execution time for each DQN was as follows: a) DQN 1: 1.28 min; b) DQN 2: 1.47 min; c) DQN 3: 1.24 min; and d) DQN 4: 1.26 min. The trained MCDM-DQN model was finally used to rank the key maintenance factors.

The experimental results are presented in Figs. 7 and 8. Figure 7 shows the ranking order of the alternatives based on the accumulated rewards, with $A_3$, $A_4$, and $A_5$ performing the best in the base scenario, followed by $A_1$ and $A_2$. In particular, the three most highly ranked alternatives are related to technical issues. Consequently, company managers must focus on these factors and implement measures in these areas to support sustainable manufacturing goals. Furthermore, the three dominant alternatives ($A_3, A_4, A_5$) remained in the top positions in the ranking in subsequent scenarios, suggesting that they should be considered equally important with such differences in experts’ preferences. The alternatives $A_1$ and $A_2$ changed their classification in different scenarios. However, they continued to occupy the lowest ranks, suggesting that changing preferences had little impact on the overall ranking. The rewards accumulated for each action in the base scenario are presented in Fig. 8. The total accumulated rewards determine the ranking of each alternative. The results obtained in this experiment are consistent with the findings in [6], which concluded that alternatives $A_3$, $A_4$, and $A_5$ should be considered equally crucial after a sensitivity analysis. A comparison analysis of the ranking of the key factors obtained in this experiment and the findings of [6] is presented in Table 8. As we can see, the MCDM-DQN performance could match that of fuzzy TOPSIS.

Table 8 A comparative analysis of the ranking of key maintenance factors obtained in experiment 2 and the findings of [6]

Full size table

5.2.3 Experiment 3: Ranking Maintenance Sustainability Strategies

In this experiment, we compared the performance of the MCDM-DQN with a benchmark established by [36]. Their study used PROMETHEE and fuzzy TOPSIS methods to assess maintenance sustainability strategies in a cement plant to determine the most effective maintenance sustainability strategy. Four assessment criteria were established: $C_1$ for environmental, $C_2$ for social and safety, $C_3$ for technical, and $C_4$ for economic. Next, four alternative maintenance sustainability strategies were identified (Table 9).

The experiment was carried out as follows. First, the criteria weights were calculated by averaging and aggregating the sub-criteria scores from various decision-makers presented by [36]. Then, the decision matrix was created based on the crisp values of the maintenance strategies presented in [68]. The weights calculated and the decision matrix were used as inputs for the MCDM-DQN model. The training of each DQN was consistently stable over time, as shown in Fig. 9. To indicate computational effort, the average training execution time for each DQN was as follows: a) DQN 1: 1.23 min; b) DQN 2: 1.37 min; c) DQN 3: 1.26 min; and d) DQN 4: 1.37 min. The trained MCDM-DQN model was finally used to rank maintenance sustainability strategies.

Table 9 Maintenance sustainability strategies

Full size table

The experimental results are illustrated in Figs. 10 and 11. Figure 10 shows the ranking order of the alternatives based on the accumulated rewards, with $A_2$ and $A_1$ performing the best in the base scenario, followed by $A_4$ and $A_3$. All strategies maintained their positions in all scenarios, indicating that changing preferences did not significantly impact the rankings. The rewards accumulated for each action in the base scenario are presented in Fig. 11. The total accumulated rewards determine the ranking of each alternative. The results obtained in this experiment are consistent with the findings in [36], which highlighted variations in the classification of maintenance strategies using PROMETHEE and fuzzy TOPSIS methods. Although the PROMETHEE approach suggested that the strategy $A_2$ is the best and $A_3$ the worst maintenance strategy, the fuzzy TOPSIS method identified $A_1$ as the top-ranking alternative, with $A_3$ and $A_4$ as the least favored strategies. A comparative analysis of the ranking of sustainable maintenance strategies obtained in this experiment and the findings of [36] are presented in Table 10. We observe that the MCDM-DQN performance matches that of PROMETHEE.

6 Discussion

In this research, we propose a MORL model, called MCDM-DQN, to assist decision-makers in evaluating maintenance factors that influence manufacturing sustainability. The MCDM-DQN model has demonstrated its ability to rank the key factors that influence manufacturing sustainability across various industry segments while considering the perspectives of decision-makers. This research also proposes a human-centered design that places decision-makers at the core of the decision-making process. The proposed solution combines traditional MCDM techniques with AI, fostering human–machine collaboration. The proposed framework also empowers decision-makers, allowing them to input data that reflects the views and perceptions of stakeholders while evaluating and validating the model’s output.

Table 10 A comparative analysis of the ranking of sustainable maintenance strategies obtained in experiment 3 and the findings of [36]

Full size table

The performance of our solution was compared with several benchmarks, and the experimental results indicated that MCDM-DQN could match the performance of traditional MCDM methods, such as TOPSIS and PROMETHEE. In addition, it offers enhanced capabilities that align with the principles of Industry 5.0. Our proposed solution provides real-time feedback and employs an a posteriori approach, where decision-makers’ preferences are not used for scalarization during the learning phase. Instead, the model calculates a coverage set of policies, enabling a quick response when more information becomes available. The advantages of using this method are twofold. It helps reduce the uncertainty surrounding user preferences, and once a coverage set has been learned, decision-makers can adjust their selected solution to reflect their updated preferences [78]. Hence, stakeholders are encouraged to manage trade-offs between objectives. These capabilities improve decision-making in the context of sustainable maintenance in manufacturing, differentiating MCDM-DQN from traditional MCDM approaches. Our model has initially been tested in a simulated environment to validate its core mechanisms and effectiveness. Although it does not directly handle real-world uncertainty, disruptions, or varying maintenance strategies across industries, it provides a foundation for decision-making in controlled settings. Its adaptive learning and multi-objective optimization capabilities enable scenario analysis, providing insights that can be further refined through industry-specific adjustments.

6.1 Practical Implications

Industry 5.0 signifies a transformation in manufacturing, emphasizing sustainability, resilience, and commitment to human values. This progressive approach aims to integrate advanced automation with human experience. In this context, sustainable maintenance activities are crucial to ensure optimal performance and longevity of industrial machinery and equipment. They also contribute to reducing pollution and waste while improving the production capacity of the manufacturing sectors. Integrating these strategies into company operations is essential for achieving short-term success and fostering a more sustainable and inclusive future for everyone. To the authors’ knowledge, no systematic methodology exists in the literature that addresses the complex dynamics of sustainability in maintenance practices for Industry 5.0. Our research aims to fill this gap by providing a cutting-edge framework that integrates a decision-making process with AI to effectively address the problem of ranking maintenance factors based on their relative importance from the perspectives of various maintenance stakeholders.

This proposed strategy assists decision-makers in navigating the complexities of sustainability assessment within the rapidly evolving Industry 5.0 landscape. It enables individuals to interact with innovative technology in industrial environments, thereby enhancing their understanding of how maintenance factors influence manufacturing sustainability. In addition, it enables decision-makers to prioritize various maintenance factors according to their organization’s operational context. However, adopting a human-centric approach within an organization often requires significant changes to prioritize stakeholder needs. Furthermore, incorporating human-centered design into current product development structures may require reengineering processes, which also requires the acquisition of crucial skills and training for successful implementation. Although our MCDM-DQN model has only been tested in a simulated environment to validate its performance and effectiveness, the experiments conducted have shown promising results, indicating that the MCDM-DQN can match the performance of traditional MCDM methods while providing real-time feedback and enabling stakeholders to quickly manage trade-offs. This is crucial for empowering decision-makers in Industry 5.0 to prioritize sustainable practices in their choices. This contribution improves the practical importance of this research in sustainable maintenance.

7 Conclusions and Future Work

This research presents a MORL model called MCDM-DQN to assist decision-makers in evaluating key maintenance factors that impact manufacturing sustainability. Our MCDM-DQN demonstrated its ability to rank relevant factors that impact manufacturing sustainability across various industry segments, matching the performance of traditional MCDM techniques while providing the added advantage of real-time feedback, allowing decision-makers to adjust their selected solution to reflect updated preferences. To answer the research questions, this study proposes a decision support framework that combines human-centered design with MCDM-DQN. This framework places decision-makers at the heart of the decision-making process. By promoting collaboration among stakeholders and integrating their feedback with the MCDM-DQN, the framework improves prioritization according to the organization’s operational context. The proposed strategy guides decision-makers in navigating the complexities of sustainability evaluation in the fast-evolving landscape of Industry 5.0. It allows decision-makers to engage with new technologies in industrial environments, thereby deepening their understanding of how maintenance factors influence manufacturing sustainability.

However, despite the relevance and innovative nature of the proposed approach, it presents some limitations that should be considered in the continuation of this work. First, our model has initially been evaluated in a simulated environment to validate its core mechanisms and effectiveness. Therefore, its ability to address real-world uncertainty, disruptions, and varying maintenance strategies in different industries has yet to be tested. Second, since the proposed approach has not been evaluated in manufacturing settings, our model cannot guarantee scalability and real-time adaptation in large-scale production environments. Future studies should aim to address the identified issues to enhance the evaluation of the proposed framework. Additionally, these limitations present opportunities for investigating how sustainable maintenance practices impact manufacturing sustainability within the context of Industry 5.0 in future research. Furthermore, it is essential to further validate the proposed MCDM-DQN model through extensive industrial case studies to improve its robustness and generalizability. It is also essential to assess the solution’s capability to incorporate domain-specific constraints and industry feedback, ensuring its effectiveness in dynamic and complex manufacturing environments. Moreover, implementing additional features to bolster the model’s robustness should be considered. This could involve an interactive human-in-the-loop methodology that effectively identifies relevant policies early in the training process.

8 Nomenclature

$D_m$:: Decision matrix
W:: Set of weights
AHP:: Analytic hierarchy process
AI:: Artificial intelligence
API:: Application programming interface
DNN:: Deep neural network
DQN:: Deep Q-network
DSS:: Decision support system
DSS:: Digital twins
DSS:: Multi-agent system
DST:: Deep-sea treasure
ELECTRE:: Elimination and choice translating reality
F-AHP:: Fuzzy analytic hierarchy process
HITL:: Human-in-the-loop
MC:: Mountain car
MCDM:: Multi-criteria decision-making
MCDM-DQN:: Multi-criteria decision-making with deep Q-networks
MDP:: Markov decision process
MOMDP:: Multi-objective Markov decision process
MORL:: Multiobjective reinforcement learning
PROMETHEE:: Preference ranking organization method for enrichment evaluation
RL:: Reinforcement learning
SW:: Software
TOPSIS:: Technique for order of preference by similarity to ideal solution

Data Availability

The authors declare that the data supporting the findings of this study are available within the paper.

Code Availability

The code supporting the findings of this study is available from the corresponding author upon request.

References

Longo F, Padovano A, Umbrello S (2020) Value-oriented and ethical technology engineering in Industry 5.0: a human-centric perspective for the design of the factory of the future. Appl Sci 10(12):4182. https://doi.org/10.3390/app10124182
Hendiani S, Liao H, Bagherpour M, Tvaronavičienė M, Banaitis A, Antucheviciene J (2020) Analyzing the status of sustainable development in the manufacturing sector using multi-expert multi-criteria fuzzy decision-making and integrated triple bottom lines. Int J Environ Res Pub Health 17(11):3800. https://doi.org/10.3390/ijerph17113800
Article Google Scholar
Farsi M, Mishra RK, Erkoyuncu JA (2021) Industry 5.0 for sustainable reliability centered maintenance. SSRN Electron J. https://doi.org/10.2139/ssrn.3944533
Psarommatis F, May G, Azamfirei V (2023) Envisioning maintenance 5.0: insights from a systematic literature review of Industry 4.0 and a proposed framework. J Manuf Syst 68:376–399. https://doi.org/10.1016/j.jmsy.2023.04.009
Article Google Scholar
Molamohamadi O, Ismail N (2013) Developing a new scheme for sustainable manufacturing. Int J Mater Mech Manuf 1–5. https://doi.org/10.7763/ijmmm.2013.v1.1
Jasiulewicz-Kaczmarek M, Antosz K, Wyczółkowski R, Mazurkiewicz D, Sun B, Qian C, Ren Y (2021) Application of MICMAC, Fuzzy AHP, and Fuzzy TOPSIS for evaluation of the maintenance factors affecting sustainable manufacturing. Energies 14(5):1436. https://doi.org/10.3390/en14051436
Article Google Scholar
Campos RSd, Simon AT (2019) Insertion of sustainability concepts in the maintenance strategies to achieve sustainable manufacturing. Independent J Manage Prod 10(6):1908–1931. https://doi.org/10.14807/ijmp.v10i6.939
Suresh M, Dharunanand R (2021) Factors influencing sustainable maintenance in manufacturing industries. J Qual Maint Eng 29(1):94–113. https://doi.org/10.1108/jqme-05-2021-0038
Article Google Scholar
Franciosi C, Iung B, Miranda S, Riemma S (2018) Maintenance for sustainability in the industry 4.0 context: a scoping literature review. IFAC-PapersOnLine 51(11):903–908. https://doi.org/10.1016/j.ifacol.2018.08.459
Baur M, Albertelli P, Monno M (2020) A review of prognostics and health management of machine tools. Int J Adv Manuf Technol 107(5–6):2843–2863. https://doi.org/10.1007/s00170-020-05202-3
Article Google Scholar
Krupitzer C, Wagenhals T, Züfle M, Lesch V, Schäfer D, Mozaffarin A, Edinger J, Becker C, Kounev S (2020) A survey on predictive maintenance for Industry 4.0. arXiv. https://doi.org/10.48550/ARXIV.2002.08224 https://arxiv.org/abs/2002.08224
Tóth A, Nagy L, Kennedy R, Bohuš B, Abonyi J, Ruppert T (2023) The human-centric Industry 5.0 collaboration architecture. MethodsX 11:102260. https://doi.org/10.1016/j.mex.2023.102260
Frazzon E.M, Agostino aRS, Broda E, Freitag M (2020) Manufacturing networks in the era of digital production and operations: a socio-cyber-physical perspective. Annu Rev Control 49:288–294. https://doi.org/10.1016/j.arcontrol.2020.04.008
Balali A, Valipour A, Edwards R, Moehler R (2021) Ranking effective risks on human resources threats in natural gas supply projects using ANP-COPRAS method: case study of shiraz. Reliab Eng Syst Safety 208:107442. https://doi.org/10.1016/j.ress.2021.107442
Sahoo SK, Goswami SS (2023) A comprehensive review of multiple criteria decision-making (MCDM) methods: advancements, applications, and future directions. Decis Making Adv 1(1):25–48. https://doi.org/10.31181/dma1120237
Singh V, Chen S-S, Singhania M, Nanavati B, kar Ak, Gupta A (2022) How are reinforcement learning and deep learning algorithms used for big data based decision making in financial industries–a review and research agenda. Int J Inf Manage Data Insights 2(2):100094. https://doi.org/10.1016/j.jjimei.2022.100094
Liu Y, Yang M, Guo Z (2022) Reinforcement learning based optimal decision making towards product lifecycle sustainability. Int J Comput Integr Manuf 35(10–11):1269–1296. https://doi.org/10.1080/0951192x.2022.2025623
Article Google Scholar
Xu J, Li K, Abusara M (2022) Preference based multi-objective reinforcement learning for multi-microgrid system optimization problem in smart grid. Memetic Comput 14(2):225–235. https://doi.org/10.1007/s12293-022-00357-w
Article Google Scholar
Zhang L, Qi Z, Shi Y (2023) Multi-objective reinforcement learning – concept, approaches and applications. Procedia Comput Sci 221:526–532. https://doi.org/10.1016/j.procs.2023.08.018
Almeida JC, Ribeiro B, Cardoso A (2023) A human-centric approach to aid in assessing maintenance from the sustainable manufacturing perspective. Procedia Comput Sci 220:600–607. https://doi.org/10.1016/j.procs.2023.03.076
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533. https://doi.org/10.1038/nature14236
Article Google Scholar
Werbińska-Wojciechowska S, Winiarska K (2023) Maintenance performance in the age of Industry 4.0: a bibliometric performance analysis and a systematic literature review. Sensors 23(3):1409. https://doi.org/10.3390/s23031409
Rakyta M, Bubenik P, Binasova V, Gabajova G, Staffenova K (2024) The change in maintenance strategy on the efficiency and quality of the production system. Electronics 13(17):3449. https://doi.org/10.3390/electronics13173449
Article Google Scholar
Raza SA, Hameed A (2021) Models for maintenance planning and scheduling – a citation-based literature review and content analysis. J Qual Maint Eng 28(4):873–914. https://doi.org/10.1108/jqme-10-2020-0109
Article Google Scholar
Bastas A (2021) Sustainable manufacturing technologies: a systematic review of latest trends and themes. Sustainability 13(8):4271. https://doi.org/10.3390/su13084271
Article Google Scholar
Xing X, Chen T, Yang X, Jiang Z (2022) Factors affecting manufacturing enterprises’ sustainable development performance – based on the fsQCA method. Pol J Environ Stud 32(1):353–369. https://doi.org/10.15244/pjoes/152989
Franciosi C, Voisin A, Miranda S, Riemma S, Iung B (2020) Measuring maintenance impacts on sustainability of manufacturing industries: from a systematic literature review to a framework proposal. J Clean Prod 260:121065. https://doi.org/10.1016/j.jclepro.2020.121065
Franciosi C, Di Pasquale V, Iannone R, Miranda S (2020) Multi-stakeholder perspectives on indicators for sustainable maintenance performance in production contexts: an exploratory study. J Qual Maint Eng 27(2):308–330. https://doi.org/10.1108/jqme-03-2019-0033
Article Google Scholar
Jasiulewicz-Kaczmarek M (2024) Maintenance 4.0 technologies for sustainable manufacturing. Appl Sci 14(16):7360. https://doi.org/10.3390/app14167360
Taherdoost H, Madanchian M (2023) Multi-criteria decision making (MCDM) methods and concepts. Encyclopedia 3(1):77–87. https://doi.org/10.3390/encyclopedia3010006
Article Google Scholar
Saaty TL (1980) The analytic hierarchy process McGraw Hill, New York. Agric Econ Rev 70:34
Chang D-Y (1992) Extent analysis and synthetic decision. Optim Tech Appl 1(1):352–355
Hwang C-L, Yoon K, Hwang C-L, Yoon K (1981) Methods for multiple attribute decision making. Multiple attribute decision making: methods and applications a state-of-the-art survey 58–191
Karagiannidis A, Moussiopoulos N (1997) Application of ELECTRE III for the integrated management of municipal solid wastes in the Greater Athens Area. Eur J Oper Res 97(3):439–449
Article Google Scholar
Brans J-P, Nadeau R, Landry M (1982) L’ingénierie de la décision. Elaboration d’instruments d’aide à la décision. La méthode PROMETHEE. In l’Aide à la Décision: Nature, Instruments et Perspectives d’Avenir 183–213
Ighravwe DE, Oke SA (2017) A multi-hierarchical framework for ranking maintenance sustainability strategies using PROMETHEE and fuzzy entropy methods. J Build Pathol Rehab 2(1). https://doi.org/10.1007/s41024-017-0028-7
Hendiani S, Liao H, Bagherpour M, Tvaronavičienė M, Banaitis A, Antucheviciene J (2020) Analyzing the status of sustainable development in the manufacturing sector using multi-expert multi-criteria fuzzy decision-making and integrated triple bottom lines. Int J Environ Res Public Health 17(11):3800
Mainar-Toledo MD, Gómez Palmero M, Díaz-Ramírez M, Mendioroz I, Zambrana-Vasquez D (2023) A multi-criteria approach to evaluate sustainability: a case study of the Navarrese wine sector. Energies 16(18):6589. https://doi.org/10.3390/en16186589
Agrawal N (2021) Multi-criteria decision-making toward supplier selection: exploration of PROMETHEE II method. Benchmarking Int J 29(7):2122–2146. https://doi.org/10.1108/bij-02-2021-0071
Goswami SS, Behera DK, Afzal A, Razak Kaladgi A, Khan SA, Rajendran P, Subbiah R, Asif M (2021) Analysis of a robot selection problem using two newly developed hybrid MCDM models of TOPSIS-ARAS and COPRAS-ARAS. Symmetry 13(8):1331. https://doi.org/10.3390/sym13081331
Article Google Scholar
Erol I, Ar IM, Peker I (2022) Scrutinizing blockchain applicability in sustainable supply chains through an integrated fuzzy multi-criteria decision making framework. Appl Soft Comput 116:108331. https://doi.org/10.1016/j.asoc.2021.108331
Pelissari R, Khan SA, Ben-Amor S (2021) Application of multi-criteria decision-making methods in sustainable manufacturing management: a systematic literature review and analysis of the prospects. Int J Inf Technol Decis Mak 21(02):493–515. https://doi.org/10.1142/s0219622021300020
Article Google Scholar
Yenugula M, Sahoo SK, Goswami SS (2024) Cloud computing for sustainable development: an analysis of environmental, economic and social benefits. J Future Sustain 4(1):59–66. https://doi.org/10.5267/j.jfs.2024.1.005
Sharma M, Sehrawat R (2020) A hybrid multi-criteria decision-making method for cloud adoption: evidence from the healthcare sector. Technol Soc 61:101258. https://doi.org/10.1016/j.techsoc.2020.101258
Cai Y, Jin F, Liu J, Zhou L, Tao Z (2023) A survey of collaborative decision-making: bibliometrics, preliminaries, methodologies, applications and future directions. Eng Appl Artif Intell 122:106064. https://doi.org/10.1016/j.engappai.2023.106064
Afrasiabi A, Tavana M, Di Caprio D (2022) An extended hybrid fuzzy multi-criteria decision model for sustainable and resilient supplier selection. Environ Sci Pollut Res 29(25):37291–37314. https://doi.org/10.1007/s11356-021-17851-2
Article Google Scholar
Gupta S, Modgil S, Bhattacharyya S, Bose I (2021) Artificial intelligence for decision support systems in the field of operations research: review and future scope of research. Ann Oper Res 308(1–2):215–274. https://doi.org/10.1007/s10479-020-03856-6
Article Google Scholar
Alavi B, Tavana M, Mina H (2021) A dynamic decision support system for sustainable supplier selection in circular economy. Sustain Prod Consum 27:905–920. https://doi.org/10.1016/j.spc.2021.02.015
Mateen A, Nam SY, Haider MA, Hanan A (2021) A dynamic decision support system for selection of cloud storage provider. Appl Sci 11(23):11296. https://doi.org/10.3390/app112311296
Kalyani Y, Collier R (2024) The role of multi-agents in digital twin implementation: short survey. ACM Comput Surv 57(3):1–15. https://doi.org/10.1145/3697350
Article Google Scholar
Tan L, Hai X, Ma K, Fan D, Qiu H, Feng Q (2023) Digital twin-enabled decision-making framework for multi-UAV mission planning: a multiagent deep reinforcement learning perspective. In: IECON 2023- 49th annual conference of the IEEE industrial electronics society, pp 1–6. IEEE, ???. https://doi.org/10.1109/iecon51785.2023.10312492 http://dx.doi.org/10.1109/iecon51785.2023.10312492
Li YL, Tsang YP, Wu CH, Lee CKM (2024) A multi-agent digital twin–enabled decision support system for sustainable and resilient supplier management. Comput Ind Eng 187:109838. https://doi.org/10.1016/j.cie.2023.109838
Abdelfattah S (2019) Intrinsically motivated multi-objective reinforcement learning. PhD thesis. https://doi.org/10.26190/UNSWORKS/21416
Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing Atari with deep reinforcement learning. arXiv. https://doi.org/10.48550/ARXIV.1312.5602
Tajmajer T (2017) Multi-objective deep q-learning with subsumption architecture. ArXiv arXiv:1704.06676
Sutton RS, Barto AG et al (1998) Introduction to reinforcement learning vol. 135. MIT press Cambridge, ???
Tajmajer T (2018) Modular multi-objective deep reinforcement learning with decision values. In: Proceedings of the 2018 federated conference on computer science and information systems. FedCSIS 2018, vol 15, pp 85–93. IEEE, ???. https://doi.org/10.15439/2018f231 http://dx.doi.org/10.15439/2018F231
Mossalam H, Assael YM, Roijers DM, Whiteson S (2016) Multi-objective deep reinforcement learning. arXiv. https://doi.org/10.48550/ARXIV.1610.02707
Yang R, Sun X, Narasimhan K (2019) A generalized algorithm for multi-objective reinforcement learning and policy adaptation. arXiv. https://doi.org/10.48550/ARXIV.1908.08342
Zhou Z, Kearnes S, Li L, Zare RN, Riley P (2019) Optimization of molecules via deep reinforcement learning. Sci Reports 9(1). https://doi.org/10.1038/s41598-019-47148-x
Hasanvand S, Rafiei M, Gheisarnejad M, Khooban M-H (2020) Reliable power scheduling of an emission-free ship: multiobjective deep reinforcement learning. IEEE Trans Transp Electrification 6(2):832–843. https://doi.org/10.1109/tte.2020.2983247
Xu J, Tian Y, Ma P, Rus D, Sueda S, Matusik W (2020) Prediction-guided multi-objective reinforcement learning for continuous robot control. In: International conference on machine learning. https://api.semanticscholar.org/CorpusID:220444563
Nguyen T.T, Nguyen N.D, Vamplew P, Nahavandi S, Dazeley R, Lim CP (2020) A multi-objective deep reinforcement learning framework. Eng Appl Artif Intell 96:103915. https://doi.org/10.1016/j.engappai.2020.103915
Oliveira THFd, Medeiros LPdS, Neto ADD, Melo JD (2021) Q-managed: a new algorithm for a multiobjective reinforcement learning. Expert Syst Appl 168:114228. https://doi.org/10.1016/j.eswa.2020.114228
Zhang L, Yan Y, Hu Y (2023) Deep reinforcement learning for dynamic scheduling of energy-efficient automated guided vehicles. Journal of Intelligent Manufacturing. https://doi.org/10.1007/s10845-023-02208-y
Article Google Scholar
Wu JJ, Song DF, Zhang XM, Duan CS, Yang DP (2023) Multi-objective reinforcement learning-based energy management for fuel cell vehicles considering lifecycle costs. Int J Hydrog Energy 48(95):37385–37401. https://doi.org/10.1016/j.ijhydene.2023.06.145
Seurin P, Shirvan K (2024) Multi-objective reinforcement learning-based approach for pressurized water reactor optimization. Ann Nucl Energy 205:110582. https://doi.org/10.1016/j.anucene.2024.110582
Anbarkhan SH (2023) A fuzzy-TOPSIS-based approach to assessing sustainability in software engineering: an Industry 5.0 perspective. Sustainability 15(18):13844. https://doi.org/10.3390/su151813844
Sahoo S, Das A, Samanta S, Goswami S (2023) Assessing the role of sustainable development in mitigating the issue of global warming. J Process Manage New Technol 11(1–2):1–21. https://doi.org/10.5937/jouproman2301001s
Article Google Scholar
Hosouli S, Elvins J, Searle J, Boudjabeur S, Bowyer J, Jewell E (2023) A multi-criteria decision making (MCDM) methodology for high temperature thermochemical storage material selection using graph theory and matrix approach. Materials Des 227:111685. https://doi.org/10.1016/j.matdes.2023.111685
Agyekum EB, Amjad F, Mohsin M, Ansah MNS (2021) A bird’s eye view of Ghana’s renewable energy sector environment: a multi-criteria decision-making approach. Utilities Pol 70:101219. https://doi.org/10.1016/j.jup.2021.101219
Patil A, Soni G, Prakash A, Karwasra K (2021) Maintenance strategy selection: a comprehensive review of current paradigms and solution approaches. Int J Qual Reliab Manage 39(3):675–703. https://doi.org/10.1108/ijqrm-04-2021-0105
Article Google Scholar
Jun Yi Tey D, Fei Gan Y, Selvachandran G, Gai Quek S, Smarandache F, Hoang Son L, Abdel-Basset M, Viet Long H (2019) A novel neutrosophic data analytic hierarchy process for multi-criteria decision making method: a case study in Kuala Lumpur stock exchange. IEEE Access 7:53687–53697. https://doi.org/10.1109/access.2019.2912913
Goulart Coelho LM, Lange LC, Coelho HM (2016) Multi-criteria decision making to support waste management: a critical review of current practices and methods. Waste Manage Res J Sustain Circular Econ 35(1):3–28. https://doi.org/10.1177/0734242x16664024
Dhiman R, Kalbar P, Inamdar AB (2018) GIS coupled multiple criteria decision making approach for classifying urban coastal areas in India. Habitat Int 71:125–134. https://doi.org/10.1016/j.habitatint.2017.12.002
Nguyen NBT, Lin G-H, Dang T-T (2021) Fuzzy multi-criteria decision-making approach for online food delivery (OFD) companies evaluation and selection: a case study in Vietnam. Processes 9(8):1274. https://doi.org/10.3390/pr9081274
Goswami SS, Jena S, Behera DK (2022) Selecting the best AISI steel grades and their proper heat treatment process by integrated entropy-TOPSIS decision making techniques. Materials Today Proc 60:1130–1139. https://doi.org/10.1016/j.matpr.2022.02.286
Hayes CF, Rădulescu R, Bargiacchi E, Källström J, Macfarlane M, Reymond M, Verstraeten T, Zintgraf LM, Dazeley R, Heintz F, Howley E, Irissappane AA, Mannion P, Nowé A, Ramos G, Restelli M, Vamplew P, Roijers DM (2022) A practical guide to multi-objective reinforcement learning and planning. Autonomous Agents Multi-Agent Syst 36(1). https://doi.org/10.1007/s10458-022-09552-y
Van Moffaert K (2016) Multi-criteria reinforcement learning for sequential decision-making problems. PhD thesis, Vrije Universiteit, Brussel
Ruksha K (2025) Why AI development is different: the role of experimentation in data science. https://medium.com/godel-technologies/why-ai-development-is-different-the-role-of-experimentation-in-data-science-7ce20339f006. Accessed 26 Feb 2025
Hasselt H (2010) Double q-learning. In: Lafferty J, Williams C, Shawe-Taylor J, Zemel R, Culotta A (eds) Advances in neural information processing systems, vol 23. Curran Associates, Inc., ???
Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A (2017) Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff. https://openreview.net/forum?id=BJJsrmfCZ
Lin L-J (1992) Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach Learn 8(3–4):293–321. https://doi.org/10.1007/bf00992699
Alegre LN, Felten F, Talbi E-G, Danoy G, Nowé A, Bazzan AL, Silva BC (2022) MO-Gym: a library of multi-objective reinforcement learning environments. In: Proceedings of the 34th benelux conference on artificial intelligence BNAIC/Benelearn
Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J, Zaremba W (2016) OpenAI Gym. arXiv. https://doi.org/10.48550/ARXIV.1606.01540

Download references

Acknowledgements

This work is partially financed through national funds by FCT - Fundação para a Ciência e a Tecnologia, I.P., in the framework of the Project UIDB/00326/2025 and UIDP/00326/2025.

Funding

Open access funding provided by FCT|FCCN (b-on).

Author information

Bernardete Ribeiro and Alberto Cardoso contributed equally to this work.

Authors and Affiliations

Joe Doe, University of Coimbra, CISUC/LASI, DEI, University of Coimbra, Polo II, Pinhal de Marrocos, Coimbra, 3030-290, Portugal
José Carlos Almeida, Bernardete Ribeiro & Alberto Cardoso

Authors

José Carlos Almeida
View author publications
Search author on:PubMed Google Scholar
Bernardete Ribeiro
View author publications
Search author on:PubMed Google Scholar
Alberto Cardoso
View author publications
Search author on:PubMed Google Scholar

Contributions

J.C.A. conceptualization, methodology, software, validation, formal analysis, investigation, data curation, writing - original draft, writing - review & editing, visualization; B.R. supervision, discussion, writing – review & editing; and A. C. supervision, discussion, writing – review & editing.

Corresponding author

Correspondence to José Carlos Almeida.

Ethics declarations

Ethics Approval

Not applicable

Consent to Participate

Not applicable

Consent for Publication

Not applicable

Competing Interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Almeida, J.C., Ribeiro, B. & Cardoso, A. A Multi-objective Reinforcement Learning Model to Support Decision-Makers in Assessing Key Maintenance Factors for Sustainable Manufacturing. Oper. Res. Forum 6, 124 (2025). https://doi.org/10.1007/s43069-025-00539-5

Download citation

Received: 03 October 2024
Accepted: 15 August 2025
Published: 23 August 2025
Version of record: 23 August 2025
DOI: https://doi.org/10.1007/s43069-025-00539-5

A Multi-objective Reinforcement Learning Model to Support Decision-Makers in Assessing Key Maintenance Factors for Sustainable Manufacturing

Abstract

Similar content being viewed by others

Deep Reinforcement Learning for Multiobjective Scheduling in Industry 5.0 Reconfigurable Manufacturing Systems

A smart framework to perform a criticality analysis in industrial maintenance using combined MCDM methods and process mining techniques

Implementation of Multi Criteria Decision Making (MCDM) Techniques in Selecting Influencing Factors for the Affecting of Artificial Intelligence (AI) in the Manufacturing

Explore related subjects

1 Introduction

2 Related Work

2.1 Sustainable Maintenance Factors

2.2 State-of-the-Art MCDM Approaches for Assessing Sustainability

2.3 Cuting-Edge MORL Algorithms

2.4 Novelty and Research Gap

3 Problem Statement and Methodology

3.1 MCDM Problem

3.2 MORL Problem

3.3 Methodology

4 The MCDM-DQN Model

4.1 MCDM-DQN Learning

5 Experimental Evaluation

5.1 The MCDM-DQN Environment

5.2 Experiments

5.2.1 Experiment 1: Assessing the Sustainability of a Software Development Company

5.2.2 Experiment 2: Evaluation of Maintenance Factors that Affect Sustainable Manufacturing

5.2.3 Experiment 3: Ranking Maintenance Sustainability Strategies

6 Discussion

6.1 Practical Implications

7 Conclusions and Future Work

8 Nomenclature

Data Availability

Code Availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics Approval

Consent to Participate

Consent for Publication

Competing Interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords