Introduction

Many real-world problems belong to constrained multiobjective optimization problems (CMOPs), which have conflicting objectives subject to various constraints [1,2,3,4]. The general CMOP can be expressed as follows:

$$\begin{aligned} \text {min}\; {\textbf{F}}({\textbf{x}})=(f_1({\textbf{x}}),f_2({\textbf{x}}),\dots ,f_M({\textbf{x}})) \end{aligned}$$
(1)

s.t.

$$\begin{aligned} {\left\{ \begin{array}{ll} &{}\textrm{g}_i({\textbf{x}})\le 0,i=1,\dots ,s \\ &{}h_i({\textbf{x}})=0,i=s+1,\dots ,t\\ &{}{\textbf{x}}\in \Omega , \end{array}\right. } \end{aligned}$$
(2)

where \({\textbf{F}}({\textbf{x}})\) is composed of M conflicting objective functions; \({\textbf{x}}=(x_1,x_2,\dots ,x_D)\) is a solution with D dimensions; \(\Omega \in {\mathbb {R}}^n\) denotes the decision space; \(\textrm{g}_i({\textbf{x}})\) and \(h_i({\textbf{x}})\) indicate s inequality constraints and \(t-s\) equality constraints, respectively; and t is the number of constraints.

To express the degree of the ith constraint violation (denoted as \(CV_i({\textbf{x}})\)) of \({\textbf{x}}\) at the ith constraint, the following formulation is used:

$$\begin{aligned} CV_i({\textbf{x}})={\left\{ \begin{array}{ll} &{} \text {max}(0,\textrm{g}_i({\textbf{x}})),\; i=1,\dots ,s \\ &{} \text {max}(0,\left| h_i({\textbf{x}}) \right| -\delta ),\; i=s+1,\dots ,t, \end{array}\right. } \end{aligned}$$
(3)

where \(\delta \) is a very small positive constraint boundary relaxation parameter (e.g., 1e–4), which turns \(h_i({\textbf{x}})\) into inequality constraints. The overall constraint violation value of \({\textbf{x}}\) (denoted as CV) is formulated as:

$$\begin{aligned} CV({\textbf{x}})=\sum _{i=1}^{t} CV_i({\textbf{x}}), \end{aligned}$$
(4)

when \(CV({\textbf{x}})=0\) means that the decision variable \({\textbf{x}}\) is a feasible solution. Otherwise, it is an infeasible solution. Given \({\textbf{x}}_1\) and \({\textbf{x}}_2\) are all feasible, if \({\textbf{F}}({\textbf{x}}_1)\) is not worse than \({\textbf{F}}({\textbf{x}}_2)\) and it at least has one better objective, \({\textbf{x}}_1\) is said to dominate \({\textbf{x}}_2\) \(({\textbf{x}}_1 \prec {\textbf{x}}_2)\). A solution is deemed Pareto-optimal if no other feasible solution dominates it. CMOPs aim to find a set of Pareto-optimal solutions that satisfy various constraints. In the decision space, the set of all feasible Pareto-optimal solutions is the Pareto-optimal set (PS). The mapping of the PS onto the objective space forms the constrained Pareto front (CPF). Similarly, when addressing unconstrained multiobjective optimization problems (MOPs), the unconstrained Pareto front (UPF) is ultimately desired [5].

In contrast to unconstrained MOPs, CMOPs pose a greater challenge in simultaneously managing conflicting objectives and constraints [6, 7]. Many constrained multiobjective evolutionary algorithms (CMOEAs) have been developed to address this issue by employing diverse constraint-handling techniques (CHTs). The current CHTs can be divided into five categories: (1) penalty function methods [8, 9]; (2) separation of constraints and objectives [10, 11]; (3) multiobjective methods [12, 13]; (4) hybrid methods [14, 15]; and (5) multi-stages and multi-populations (MSMP) [16, 17]. Although those methods employed in the state-of-the-art CMOEAs have demonstrated high performance on certain CMOPs, they still have limitations when it comes to solving problems with complex infeasible regions and small discrete feasible regions. Unfortunately, many real-world problems exhibit such characteristics, such as the problem of synchronous optimal pulse-width modulation of 3-level inverters [18], which pose challenges to the existing CMOEAs. More specifically, the penalty function methods and separation of constraints and objectives require careful tuning of related parameters. Designing an additional objective becomes challenging in multiobjective methods. Hybrid methods demand differentiability of the problem [19]. MSMP-based CMOEAs overcome the challenges of directly solving CMOPs by leveraging infeasible solutions to extract valuable information, which facilitates the collaboration between populations and stages [20]. However, they encounter difficulties in effectively leveraging infeasible solutions.

Inspired by the success of MSMP, a novel tri-stage with reward-switching mechanism framework (TSRSM) is proposed for CMOPs. The three stages of TSRSM employ distinct strategies to leverage infeasible solutions. The novel features of TSRSM are as follows:

  1. 1.

    The proposed TSRSM framework is comprised of three stages: the push stage, the pull stage, and the repush stage. Each stage employs two cooperative populations, namely \({\text {Pop}}_1\) and \({\text {Pop}}_2\). The role of \({\text {Pop}}_2\) varies across different stages. In the push stage, \({\text {Pop}}_2\) aims to converge to the UPF and guide \({\text {Pop}}_1\) to pass through infeasible regions. Subsequently, \({\text {Pop}}_2\) employs the constraint relaxation technique to enhance feasibility in the pull stage. Finally, \({\text {Pop}}_2\) reconvenes with the UPF using knowledge transfer and shares its unique insights to inform and guide \({\text {Pop}}_1\) in the repush stage. The novel characteristic of this approach is that the \({\text {Pop}}_2\) alternates between the UPF and CPF, resulting in greater effectiveness compared to the single-direction movement of the auxiliary population in existing CMOEAs (e.g., CCMO), as evidenced by experimental results.

  2. 2.

    A novel reward-switching mechanism (RSM) is devised to decide when to switch stages by evaluating the convergence and diversity levels exhibited by the population. One distinct characteristic of this approach is that RSM takes into account the convergence and diversity of the population simultaneously, making it a more accurate method to switch stages compared to other switching mechanisms.

To demonstrate the performance of TSRSM, 9 state-of-the-art CMOEAs were selected for comparison on three benchmark test sets and 30 real-world CMOPs [21]. The results reveal that the proposed method achieves superiority over other CMOEAs on both benchmark problems and real-world CMOPs. Additionally, TSRSM obtains the best performance on 10 real-world problems, including the synchronous optimal pulse-width modulation of 3-level inverters problem, the multi-product batch plant problem [22], the heat exchanger network design problem [23], and others. This achievement represents the highest number of best results compared to other CMOPs.

It should be noted that while many research works are based on multi-stage approaches [24, 25], they primarily focus on utilizing different tasks in each stage, rather than emphasizing the optimization problems themselves. In the TSRSM, the current stage can continue to evolve based on its performance in the problem at hand. This means that if the pull stage performs well in the problem, there may be no need for a repush stage.

The remainder of this paper is organized as follows. In “Related works and motivation”, we review the existing MSMP and explain our motivations. In “Proposed method”, the proposed TSRSM is introduced. “Experimental results” shows the experimental setting and results. Finally, “Conclusions and future work” presents the conclusions and future work.

Related works and motivation

This section provides an overview of the existing CMOEAs that are relevant to the field of MSMP, as this paper specifically emphasizes the multi-stage framework.

CMOEAs based on two stages and two populations

This part of the method tries to balance objectives and constraints by two stages and two populations.

As for two-population algorithms, one representative is CCMO [26], which is a coevolutionary framework featuring two weak cooperative populations. One population is exclusively dedicated to solving the original CMOPs with the specific objective of finding the CPF. In contrast, the other population focuses its efforts on discovering the UPF. Another two-population algorithm called cDPEA [27], in which one population is designed to preserve competitive infeasible solutions, and the other population adopts a feasibility-oriented approach to handle infeasible solutions. Furthermore, a novel adaptive fitness function was implemented to regulate the trade-off between convergence and diversity.

As for two-stage algorithms, one representative is PPS-MOEA/D [28], which introduced a push–pull searching strategy. The push stage is mainly focused on directing the population toward the UPF, while the pull stage is responsible for attracting the population toward the CPF. The switching mechanism employs the gradient of the maximum of the nadir and the minimum of ideal points. DD-CMOEA [5] employed this switching mechanism in the exploration and exploitation stage with two populations. The primary objective of the exploration stage is to search for informative infeasible solutions. In contrast, the exploitation stage leverages infeasible solutions to explore nearby feasible solutions. CMOEA-MS [29] is another two-stage algorithm that consists of a first stage for identifying feasible regions and a second stage for spreading along feasible boundaries. Moreover, CMOEA-MS used fitness evaluation strategies to adaptively balance objectives and constraints in two stages. Another strategy was developed by TSTI [30], which employed different emphases on the three indicators (namely convergence, diversity, and feasibility) in two stages. The first stage is to obtain solutions with good distribution and to prevent the population from falling into local optima. The second stage is to quickly converge to the CPF. DATEA [31] used weak coevolution of the dual population to consider constraints in the first stage. Then, a feasibility-oriented approach is employed to guide a single population in spreading across the feasible regions discovered in the first stage. URCMO [32] utilized the knowledge learned from the learning stage about the relationship between the UPF and the CPF to guide the evolving strategies in the evolving stage.

Inspired by the success of evolutionary multitask (EMT) in other fields, such as high-dimensional classification feature selection problems, some researchers have attempted to develop EMT to solve CMOPs. Qiao et al. [33] first introduced EMT [34] into CMOEAs (named EMCMO), which includes two tasks: the first task is designed to solve the original CMOP, and the other is for the unconstrained MOP. Furthermore, a transfer strategy was devised to determine whether to transfer parent or offspring sets into the environmental selection. A novel EMT, named MTCMO [35], was subsequently developed, which employs a dynamic auxiliary task and leverages an improved \(\epsilon \)-constraint method to effectively tackle complex CMOPs. Furthermore, a tri-task framework known as CMOEMT [36] was introduced. Three tasks are designed for the original CMOP, the unconstrained MOP, and the relaxed CMOP, respectively. The evolutionary process can be broken down into two distinct stages: the evolving stage and the transfer stage. During the evolving stage, three specific tasks evolve independently. Conversely, the transfer stage effectively transmits relevant information among the three tasks.

CMOEAs based on three stages and three populations

This part of the method attempts to balance objectives and constraints in a more granular way.

TriP [37] is a representative three-population algorithm. Two populations evolve using a weak coevolutionary framework to handle the original CMOP and the unconstrained MOP separately, while the third population independently addresses the relaxed CMOP. Three populations of TriP and CMOEMT have the same purpose, but the way they exchange information with each other is different. C3M [24] is one representative three-stage algorithm. In the early stage, setting aside the typical consideration of feasibility to enable a more thorough exploration of the objective space. At the medium stage, the algorithm focuses on individual constraints, selecting those of the highest priority to explore the objective space further. In the last stage of the algorithm, feasibility is fully accounted for to enhance the quality of solutions achieved in the previous two stages. Another three-stage algorithm, TSCSO [25], introduced a tri-stage competitive swarm optimizer. The first stage focuses on achieving global convergence to the UPF, the second stage aims to enhance the diversity of the population and explore more feasible regions, and the third stage is utilized to search for the feasible regions omitted in the previous stage.

Motivations

The aforementioned studies share a common objective of addressing CMOPs by utilizing distinct populations or stages to handle the CMOP, unconstrained MOP, and relaxed CMOP. This is because MSMP-based evolutionary algorithms can circumvent the challenges encountered in directly solving the original CMOPs by employing a well-designed staged approach [19]. First, the stage of solving the unconstrained MOP helps to find promising solutions by ignoring constraints. Second, the stage of solving the relaxed CMOP is beneficial to expanding feasible solutions. Third, the stage of solving the original CMOP allows the population to converge further to the CPF. However, the inadequate weight they possess in the evolutionary process can lead to challenges in solving specific problems. For example, if the algorithm neglects the utilization of the relaxed CMOP, it struggles to solve problems characterized by large infeasible regions and small discrete feasible regions. Hence, designing an effective framework and switching mechanism is crucial for achieving optimal results.

The existing frameworks still have some weaknesses. CCMO utilized a coevolutionary framework with two weak cooperative populations, which search for the CPF and UPF, respectively. However, searching for the UPF in the later stages results in significant resource wastage, which means that it may be ineffective to search for the CPF when the UPF and CPF are located far apart. TriP used the tri-population-based coevolutionary framework, which solves the CMOP, the unconstrained MOP, and the relaxed CMOP, respectively. The third population uses the \(\epsilon \)-constrained technique in PPS-MOEA/D, easily falling into the local optimum, such as MW1, MW2, and MW10 [38]. CMOEMT encounters a similar challenge to Trip as it also utilizes the \(\epsilon \)-constrained technique independently during the initial stage. EMCMO employed an EMT framework with knowledge transfer, which has been demonstrated to achieve high performance on MW [38] problems. MTCMO improved the EMT framework and knowledge transfer, which also has high performance on MW. However, both EMCMO and MTCMO have limitations in dealing with problems that have large infeasible regions, such as LIR-CMOP [39], because they primarily prioritize feasibility.

Switching mechanisms are designed to achieve a balance among the distinct stages that serve different tasks. However, the existing switching mechanisms still encounter difficulties due to their inaccurate judgment of CMOPs with diverse characteristics as illustrated in Table 1. Consequently, the efficiency and versatility of these existing switching mechanisms remain insufficient.

Table 1 Switching mechanism of existing algorithm

To solve the aforementioned challenges, we propose a novel TSRSM framework, which is described in “Proposed Method”.

Proposed method

The procedure of TSRSM

As presented in Fig. 1, the proposed TSRSM is a cooperative coevolutionary framework, including two populations, namely, \({\text {Pop}}_1\) and \({\text {Pop}}_2\), and three stages, namely the push stage, the pull stage, and the repush stage. Figure 2 illustrates the distribution of two populations across three stages. The change in position of \({\text {Pop}}_2\) is the origin of the name given to the tri-stage. RSM is employed in the stage transition. The pseudo-code of TSRSM is shown in Algorithm 1. At the beginning of TSRSM, \({\text {Pop}}_1\) and \({\text {Pop}}_2\) are initialized randomly with size N. Then, \(\alpha \) is initialized with 20, which is described in detail in “Reward-switching mechanism”. RSM is applied to calculate \(\alpha \) and \(\beta \) by Algorithm 2, which determines the evolutionary stage. In the push stage, \({\text {Pop}}_1\) and \({\text {Pop}}_2\) are evolved with the weak coevolutionary framework. \({\text {Pop}}_2\) is designed to deal with the unconstrained MOP and aims to quickly converge to the UPF. In the pull stage, \({\text {Pop}}_2\) evolves independently using MOEA/D-Epsilon, while \({\text {Pop}}_1\) uses information from \({\text {Pop}}_2\) to evolve. In the repush stage, \({\text {Pop}}_1\) and \({\text {Pop}}_2\) will utilize knowledge transfer to select a parent set or an offspring set from another population in their respective environmental selection, which is different in the push stage.

Fig. 1
figure 1

The flowchart of TSRSM

Fig. 2
figure 2

The distribution of the two populations in three stages

It should be noted that the CDP method and fast nondominated sorting method are utilized for environmental selection in both populations. Besides, the fitness assign method and the truncation strategy proposed in SPEA2 [40] are used in TSRSM. The above methods are used because they have shown promising performance in CCMO [26], Trip [37], MTCMO [35], etc. Naturally, \({\text {Pop}}_2\) evolved by MOEA/D-Epsilon in the pull stage does not employ these methods.

Algorithm 1
figure a

Procedure of TSRSM

Reward-switching mechanism

As previously discussed, determining the transition between stages is crucial, as the evolutionary process tends to decelerate in later phases, making fixed criteria obsolete. Hence, RSM dynamically adjusts the number of evolutionary generations in the first two stages based on the performance of the respective population.

To assess the performance of the population, RSM uses two indicators: convergence and diversity. The recommended approach for dispensing rewards is as follows:

$$\begin{aligned} \text {MG}_k=\text {max}(Rc_k,Rd_k)\ge \lambda , \end{aligned}$$
(5)

where \(\text {MG}_k\) is the maximum rate of change between the convergence and diversity observed in the last gr generations up to the kth generation, and \(\lambda \) represents the threshold value for the rate of change. If \(\text {MG}_k\ge \lambda \), the current stage will be extended for an additional gr generations to facilitate further evolutionary progress. Therefore, \(\text {MG}_k\) also indicates the performance of the population over the span of gr generations. The rates of change of the convergence and diversity are formulated in Eqs. 6 and 7, respectively.

$$\begin{aligned} Rc_k= & {} \frac{ \left| sc_k-sc_{k-gr} \right| }{\text {max}(sc_{k-gr},\delta )}, \end{aligned}$$
(6)
$$\begin{aligned} Rd_k= & {} \frac{ \left| sd_k-sd_{k-gr} \right| }{\text {max}(sd_{k-gr},\delta )}, \end{aligned}$$
(7)
$$\begin{aligned} sc_k= & {} \sum _{i=1}^{N} Ic_i^k, \end{aligned}$$
(8)
$$\begin{aligned} sd_k= & {} \sum _{i=1}^{N} Id_i^k, \end{aligned}$$
(9)

where \(sc_k\) and \(sd_k\) indicate the sum of \(Ic_i^k\) and \(Id_i^k\) at the kth generation, respectively, for the corresponding population, as indicated in Eqs. 8 and 9; \(Ic_i^k\) and \(Id_i^k\) represent the convergence and diversity of the ith individual at the kth generation, respectively; gr is the number of reward generations; N represents population size; and \(\delta \) is a small parameter (e.g., 1e–6). The calculation of \(Ic_i^k\) and \(Id_i^k\) is the same as that of TSTI [30]. For reasons of space, this paper does not show here. It should be noted that the calculations of \(sc_k\) and \(sd_k\) involve the use of \({\text {Pop}}_1\) in the push stage. The updation of gr is distinct in different stages as follows.

$$\begin{aligned} gr_i={\left\{ \begin{array}{ll} &{} 20,\; i=1 \\ &{} 200,\; i=2, \end{array}\right. } \end{aligned}$$
(10)

where \(i={1,2}\) means in the push stage and pull stage, respectively. To make readers more easily understand, gr means the number of rewarded generations, while the subscript i in the gr indicates the number of generations rewarded in either the push or pull stage. When in the push stage, awarded 20 generations at a time. However, when in the pull stage, 200 generations will be available to reward more generations. This is because, over time, populations face increasing challenges in exploring superior solutions during the evolution process, which is discussed in “Experimental results”.

Algorithm 2 gives the procedure of RSM. Firstly, \(\text {MG}_k\) is calculated by Eq. 5. Then, gr is calculated in different stages. When \(\text {MG}_k\ge \lambda \), \(\alpha \) and \(\beta \) are updated accordingly. In the push stage, when \(\text {MG}_k<\lambda \), \(\beta \) is initialized. It should be noted that \(\text {MG}_k\) only be calculated when \(k=\alpha \) or \(k=\beta \) to reduce computational time.

Algorithm 2
figure b

Procedure of RSM

The push stage

The purpose of the push stage is to quickly converge to UPF. The procedure of the push stage is presented in Algorithm 3. \({\text {Pop}}_1\) and \({\text {Pop}}_2\) are evaluated by the fitness assign method. \({\text {Pop}}_1\) considers both objectives and constraints, while \({\text {Pop}}_2\) only considers objectives. Then, \({\text {Pop}}_1\) and \({\text {Pop}}_2\) select their mating pool by binary tournament selection. Simulated binary crossover (SBX) [41] and polynomial mutation (PM) [42] are employed to generate N/2 offspring for each population. Then, both offspring set is added into \({\text {Pop}}_1\) and \({\text {Pop}}_2\), respectively, denoted as \(\text {CP}_1\) and \(\text {CP}_2\). Next, \(\text {CP}_1\) and \(\text {CP}_2\) are evaluated by the original CMOP and the unconstrained MOP, respectively. Finally, \({\text {Pop}}_1\) and \({\text {Pop}}_2\) are updated based on environment selection.

Algorithm 3
figure c

Procedure of the push stage

The pull stage

The purpose of the pull stage is to explore and expand the set of feasible solutions to address CMOPs characterized by complex feasible regions. Algorithm 4 outlines the procedure of the pull stage. \({\text {Pop}}_1\) evolves in a manner similar to its evolution in the push stage, with the difference that \(\text {CP}_1\) consists of the population sets \({\text {Pop}}_1\), \({\text {Pop}}_2\), and \({\text {Off}}_1\). The reason why \({\text {Off}}_2\) is not used in \(\text {CP}_1\) is discussed in “Experimental results”. The evolution of \({\text {Pop}}_2\) is accomplished through the utilization of MOEA/D-Epsilon, following the same approach as the second stage of PPS-MOEA/D [28]. \(\epsilon \) is updated by Eq. 11.

$$\begin{aligned} \epsilon (v)=\left\{ \begin{matrix} (1-\gamma )\varepsilon (v-1),&{} \text {if }\; rf_v< \varphi \\ \epsilon (0)(1-\frac{v}{T_c})^{sp},&{} \text {if }\; rf_v\ge \varphi , \end{matrix}\right. \end{aligned}$$
(11)

where v represents the vth generation from the beginning of the pull stages; \(rf_v\) is the feasible ratio of \({\text {Pop}}_2\); sp controls the speed at which the relaxation of constraints is reduced; \(\gamma \) and \(\varphi \) are control parameters; \(\epsilon (0)\) is the maximal CV of \({\text {Pop}}_2\) at the end of push stage; and \(T_c\) indicates the total number of generations except the push stage. The number of generations in the pull stage is not utilized because it is indeterminate at the beginning. The influence on evolution is minimal, as \(\epsilon \) are nearly equal in the later phases.

Algorithm 4
figure d

Procedure of the pull stage

The repush stage

Different from the push stage, knowledge transfer [33] is employed in environment selection. The purpose of knowledge transfer is to select a more effective population to attend environment selection. As presented in Algorithm 5, Lines 8–13 are the procedure of knowledge transfer. The effectiveness of the populations relies on the successful transfer rates of parent sets and offspring sets (Rp and Ro), which are calculated as Eqs. 12 and 13:

$$\begin{aligned} Rp= & {} \frac{ {\text {num}}\_p}{N}, \end{aligned}$$
(12)
$$\begin{aligned} Ro= & {} \frac{ {\text {num}}\_{\text {off}}}{\frac{N}{2}}, \end{aligned}$$
(13)

where \(Rp\in [0,1]\), \(Ro\in [0,1]\), \({\text {num}}\_p\) and \({\text {num}}\_{\text {off}}\) are, respectively, the number of parent sets and offspring sets in the N best individuals, which are selected from \(TCP_1\) and \(TCP_2\) by environment selection. If \(Rp>Ro\), it means parent sets are more effective in accommodating the final environment selection. To maintain diversity, N/2 individuals are selected randomly from parent sets for \({\text {Trans}}P\). If \(Rp<Ro\), the offspring sets are selected for \({\text {Trans}}P\). Finally, the new populations are obtained by environment selection.

Algorithm 5
figure e

Procedure of the repush stage

Computational complexity

The proposed TSRSM mainly includes genetic operators, mating selection, and population updation, except for \({\text {Pop}}_2\) in the pull stage. Since the MOEA/D-Epsilon method uses PPS-MOEA/D in the pull stage, the complexity is \(O(M\cdot N^2)\). Moreover, population updation uses SPEA2, so the complexity is \(O(M\cdot N^3)\). The complexity of genetic operators and mating selection are, respectively, \(O(N\cdot D)\) and O(N). In summary, the worst computational complexity of TSRSM is \(O(M\cdot N^3)\). However, the reward-switching mechanism and knowledge transfer method would consume some computational time.

Experimental results

To evaluate the performance of the proposed TSRSM in solving CMOPs, a series of experiments are conducted on PlatEMO [43].

Experimental settings

Test functions

To demonstrate the performance of TSRSM, three benchmark test suites (MW [38], LIRCMOP [39], and constraint DTLZ [44, 45]) are adopted. The details of those are in Table 2.

Table 2 Description of three benchmark test suites

Compared algorithms

To perform the effectiveness of the proposed TSRSM, the 9 most state-of-the-art algorithms are selected as follows:

  1. 1.

    Two-population based: CCMO [26] and cDPEA [27].

  2. 2.

    Two-stage based: CMOEA-MS [29], DSPCDE [16], and TSTI [30].

  3. 3.

    EMT-based: MTCMO [35] and CMOEMT [36].

  4. 4.

    Three-population based: TriP [37].

  5. 5.

    Three-stage based: C3M [24].

In algorithms that use GA as an operator, the parameters of SBX and PM are set as follows:

  1. 1.

    Crossover probability of SBX: 1

  2. 2.

    Mutation probability of PM: 1/D

In algorithms that use DE as an operator, the parameters CR and F are 1 and 0.5, respectively.

The population size N is set to 91, taking into account the influence of the weight vector set on the actual population size in PlatEMO [46]. This consideration serves to render the experimental results more reliable and consistent [36]. The maximum number of function evaluations \(E_{\text {max}}\) is set to 100,000 for all test functions. Moreover, each algorithm runs independently for 30 times on each test function. The parameter settings of all the methods in comparison are the same as suggested in their original literature. TSRSM uses the same parameter for constraint relaxation as the PPS-MOEA/D. Specifically, the parameters \(\lambda \), \(gr_1\), and \(gr_2\) in TSRSM are set to 1e–2, 20, and 200, respectively.

Performance indicators

To measure the performance of each algorithm, inverted generational distance based on modified distance calculation (IGD+) [47], hypervolume (HV) [48], and feasible rate (FR) [49] were adopted as indicators. IGD+ measures the average distance between the true PF and the nearest individual, reflecting the algorithm’s convergence. A smaller IGD+ value indicates better algorithm performance. HV calculates the volume enclosed by the obtained solutions and the predefined reference point, serving as an indicator of both convergence and diversity. A higher HV value corresponds to better algorithm performance. The FR represents the ratio of runs in which the method successfully discovers feasible solutions in the final generation. A higher FR value indicates a stronger ability to find feasible regions. Moreover, the Wilcoxon test at the 0.05 level is performed to estimate the difference between the two algorithms. "+", "-" and "\(\approx \)" indicate that the compared algorithm is significantly better than, significantly worse than, or statistically similar to TSRSM.

Comparison with peer algorithms

In this part, the proposed TSRSM is compared with the 9 most state-of-the-art CMOEAs in three benchmark suits. The results of IGD+ and HV are reported in Tables 4 and 5, where the Bold block means the best result among the ten algorithms. To clearly analyze the results, the Wilcoxon test is conducted and the corresponding results are presented in tables. The experimental results using the IGD+ metric demonstrate that TSRSM achieves superior performance compared to CCMO, cDPEA, CMOEA_MS, DSPCMDE, TSTI, MTCMO, CMOEMT, TriP, and C3M on 21, 22, 31, 36, 30, 26, 24, 24, and 35 test functions, respectively. Similarly, according to the HV metric, TSRSM outperforms the other 9 compared algorithms on 22, 22, 32, 35, 29, 26, 23, 25, and 35 test functions, respectively. This confirms the effectiveness of TSRSM.

Among those compared algorithms, CMOEMT has 6 test functions better than TSRSM on IGD+. CMOEMT uses three populations solving for the CMOP, the unconstrained MOP, and the relaxed CMOP, respectively, which is similar to TriP. Moreover, CMOEMT and TriP show similar performance on LIRCMOP5, LIRCMOP6, LIRCMOP10, and LIRCMOP11, surpassing TSRAM. The same character of those functions is that interspersed distribution of feasible and infeasible regions. The consistently searching for the UPF may be helpful in identifying a wide range of the CPF for this type of problem. Moreover, in the earlier phases, the three populations of the CMOEMT evolve independently, with one of them utilizing PPS-MOEA/D (\(\epsilon \)-constrained technique), which is effective for the LIRCMOP. The same issue also arises with TriP, as it independently utilizes PPS-MOEA/D. However, when it comes to the MW and CDTLZ benchmark test sets, PPS can easily converge to a local optimum in the former phases. If PPS-MOEA/D is not effective for the current problem, it would result in wasted computational time. At the pull stage, TSRSM employs PPS-MOEA/D to mitigate the risk of local optimum, particularly considering that \({\text {Pop}}_2\) has already attained the UPF during the push stage. Furthermore, if the expected performance is unavailable in the pull phase, RSM will switch to the next stage. Therefore, TSRSM performs significantly better in the MW and CDTLZ problems, and slightly better in the LIRCMOP problem.

CCMO, cDPEA, and MTCMO all use two cooperative populations. The distinctive feature of MTCMO is that it enables knowledge transfer between two populations. Therefore, MTCMO is more competitive than CCMO and cDPEA. Furthermore, MTCMO shows better performance than TSRSM on MW5, MW11, MW12, and DC1_DTLZ3, which have the same feature: disconnected CPF. This may be because the knowledge transfer and improved \(\epsilon \)-constraint method help MTCMO find more promising solutions at the edge of CPF for this type of problem.

DSPCMDE, TSTI, C3M, and CMOEA_MS are all multi-stage-based methods. However, they only have less than 3 functions better than TSRSM. This demonstrates the superiority of the three-stage framework in TSRSM.

Table 6 displays the FR results, demonstrating that CCMO, CMOEMT, TriP, and TSRSM achieved 100% feasible rates across three benchmark test sets. These algorithms share a common characteristic, which is the utilization of a weak coevolutionary framework. The weak coevolutionary framework employs two populations: one for the unconstrained MOP and the other for the CMOP, facilitating the population’s traversal of infeasible regions. CMOEA_MS, cDPEA, DSPCMDE, TSTI, MTCMO, and C3M exhibited lower performance than TSRSM on 3, 6, 9, 1, 1, and 7 test functions, respectively.

For a visual comparison, Fig. 3 shows solutions with the median IGD+ value among 30 runs obtained by TSRSM and the compared 9 algorithms on MW10 and C1_DTLZ3. For MW10 with small and discontinuous feasible regions, TSRSM outperforms the other compared algorithms in terms of both diversity and convergence performance. DSPCMDE and C3M encounter difficulties in escaping local optimum. Other algorithms are capable of finding the CPF, but they tend to exhibit limited diversity on the CPF. TriP achieves similar convergence levels on the CPF. However, due to TriP independently employing the \(\epsilon \)-constrained technique, which is susceptible to getting stuck in the local optimum for this problem type, it demonstrates inferior diversity when compared to TSRSM. For C1_DTLZ3 with multimodality, DSPCMDE, TSTI, MTCMO, and C3M fail to find feasible solutions with optimal objectives. CCMO, cDPEA, CMOEA_MS, CMOEMT, and TriP can find feasible solutions. However, they may also generate solutions that lie significantly outside the feasible region. In the case of C1_DTLZ3, the CPF coincides with the UPF, making it advantageous to search for the UPF when addressing this particular problem. TSRSM demonstrates the ability to transition to the repush stage at an early stage, leveraging knowledge transfer to effectively drive evolution. Therefore, TSRSM obtains the best diversity and convergence performance than the other algorithms.

Fig. 3
figure 3

Solutions with the median IGD+ value among 30 runs obtained by TSRSM and the compared 9 algorithms on MW10 and C1_DTLZ3. The red points represent solutions in \({\text {Pop}}_1\). The gray region is the feasible region

To further illustrate the performance of TSRSM, the average ranking of TSRSM with 9 state-of-the-art CMOEAs obtained by applying the Friedman test at the significance level 0.05, is shown in Fig. 4. The lower the ranking, the better the performance. TSRSM obtains the lowest ranking with 2.50 in HV and 2.58 in IGD+. Moreover, the rigorous difference is testified by applying post hoc statistical procedures [50]. The adjusted p-values of Friedman test with the Holm and Hochberg procedure are presented in Table 3. Therefore, TSRSM still outperforms other state-of-the-art algorithms by the Friedman test.

Fig. 4
figure 4

Averge ranking of all 10 algorithms on three benchmark test sets by Friedman test

Table 3 The adjusted p-values of IGD+ and HV obtained through Friedman test of TSRSM and 9 state-of-the-art algorithms

The exceptional performance of TSRSM can be attributed to the following factors:

  1. 1.

    For CDTLZ and MW which have simply feasible regions, the push and repush stage of TSRSM plays a very important role in solving this kind of problem.

  2. 2.

    For LIRCMOP with complex feasible regions and large infeasible regions, \(\epsilon \)-constrained technique has good performance. TriP, CMOEMT, and proposed TSRSM all use \(\epsilon \)-constrained technique. However, TSRSM is still better than TriP and CMOEMT, due to the cooperation between the two populations, and RSM will reward more generations for the pull stage.

  3. 3.

    In summary, TSRSM obtains the most competitive results in three benchmark sets with complex and uncomplicated regions. Three stages and two populations in TSRSM carry out their duties and cooperate with each other. Moreover, the RSM plays a crucial role in determining when to switch stages.

Comparison on real-world CMOPs

To provide a more comprehensive evaluation of TSRSM, real-world CMOPs are used to conduct experiments in this part. The former 30 instances of RWMOPs[21] are selected. The results of HV obtained by TSRSM and the other 9 peer algorithms are listed in Table 7. In the table, NaN indicates that the algorithm failed to find a feasible solution even after 30 runs. Despite the unique characteristics and challenges posed by different real-world CMOPs, TSRSM consistently outperforms CCMO, cDPEA, CMOEA_MS, DSPCMDE, TSTI, MTCMO, CMOEMT, TriP, and C3M on 12, 13, 19, 20, 14, 11, 15, 17, and 15 real-world CMOPs, respectively. Although CMOEMT and Trip have 8 and 5 benchmark test functions better than TSRSM on HV, respectively, they only have one or two real-world problems better than TSRSM. Furthermore, TSRSM achieves the 10 best results on real-world problems, such as the synchronous optimal pulse-width modulation of 3-level inverters problem, the multi-product batch plant problem, the heat exchanger network design problem, and others. This is the highest number of best results among the compared CMOPs. Therefore, the effectiveness of the proposed TSRSM in solving real-world problems is demonstrated.

Discussions about TSRSM

Investigation into the search behavior

We investigate the search behavior of TSRSM across different test problems. Two representative test functions, LIRCMOP1 and MW9, have been selected because they exhibit distinct sizes of feasible regions and varying degrees of overlap between the CPF and UPF. The results of IGD+, HV, RFS, and the distributions of solutions for MW9 and LIRCMOP1 are presented in Fig. 5, obtained from a single run of TSRSM. This particular run is chosen out of thirty runs due to its median IGD+.

1. The effect of using reward-switching mechanism: Fig. 5a, e depict the results of IGD+ and HV for the number of function evaluations, along with RFS in \({\text {Pop}}_1\) and \({\text {Pop}}_2\). The first switching point refers to the transition from the push stage to the pull stage, while the second switching point corresponds to the transition from the pull stage to the repush stage in Fig. 5a. Figure 5e illustrates a single switching point, specifically representing the transition from the push stage to the pull stage. From Fig. 5a, e, we can obtain the following observations:

  1. 1.

    The RSM will transition to the next stage when the process of evolution becomes significantly slow or stagnant.

  2. 2.

    If a particular stage consistently performs well on the problem, there would be no need to switch to another stage, such as LIRCMOP1 depicted in Fig. 5e.

Fig. 5
figure 5

The IGD+, HV, and RFS results of TSRSM on MW9 and LIRCMOP1 are, respectively, shown in a and e. bd present the distributions of solutions on MW9 at the first and second switching points, as well as the final evaluation, respectively. f, g Show the distributions of solutions on LIRCMOP1 at the first switching point and the second switching point. h Shows the distributions of solutions on LIRCMOP1 at the final evaluation for TSRSM without the pull stage (TSRSM-PR)

To further showcase the competitiveness of the RSM, two additional popular switching mechanisms are employed in TSRSM. For the number of evaluations method, when \(\alpha =0.2\), for most problems, the \({\text {Pop}}_2\) could reach the UPF [33]. Therefore, to simplify the experiment, \(\alpha \) and \(\beta \) are set to 0.2 and 0.4, respectively, in TSRSM1. Another switching mechanism used in TSRSM2 is the gradient of the maximum of the nadir and the minimum of ideal points [5, 28, 37], which is employed to evaluate the performance of the population. The RFS method is not applicable to TSRSM since it is specifically designed for dual-stage methods. Furthermore, the switching mechanism utilized in C3M is not applicable due to its implementation of gradually increasing constraint numbers during the middle stage. In C3M, when the constraint numbers meet the threshold, the final stage is activated. Therefore, only two switching mechanisms are selected to compare with RSM. However, those two switching mechanisms are popularly used in many algorithms as discussed in “Related works and motivation”.

The detailed IGD+ results and HV results of those three algorithms are listed in Table 8 and Table 9. Comparison results demonstrate that TSRSM outperforms TSRSM1 and TSRSM2 on 11 and 9 test functions, respectively, according to IGD+ results. Similarly, TSRSM achieves competitive better results on HV.

2. The effect of using two populations: To investigate the roles of the two populations, the distributions of solutions are monitored at the switching point and the final evaluation in Fig. 5b–d and f–h, in addition to the RFS in the different populations (Fig. 5a, e. From Fig. 5, we have the following observations:

  1. 1.

    \({\text {Pop}}_1\) mainly serves as storing feasible solutions, while \({\text {Pop}}_2\) has the ability to preserve both infeasible and feasible solutions simultaneously. That is, \({\text {Pop}}_1\) aims to search for the CPF, thus focusing on the feasible regions. \({\text {Pop}}_2\) aims to converge to the UPF on the push stage and the repush stage, and toward to the CPF on the pull stage. Furthermore, the RFS in \({\text {Pop}}_2\) varies depending on the geometry of the feasible and infeasible regions. As is shown in Fig. 5e, the RFS exhibits significant fluctuations in the pull stage for LIRCMOP1 due to its small feasible regions. In contrast, MW9 showcases minimal fluctuations in the pull stage given its larger feasible regions in Fig. 5a.

  2. 2.

    In the push stage, \({\text {Pop}}_2\) converges to the UPF, which has better objective values than \({\text {Pop}}_1\), as presented in Fig. 5b and Fig. 5f. Therefore, \({\text {Pop}}_2\) can help \({\text {Pop}}_1\) cross infeasible regions.

  3. 3.

    In the pull stage, \({\text {Pop}}_2\) utilizes the \(\epsilon \)-constraint method [28], resulting in an initial increase in RFS. Once the RFS exceeds a threshold value \(\varphi \) (e.g., 0.95), the constraint is then relaxed, leading to oscillations in the RFS of \({\text {Pop}}_2\). In other words, when RFS exceeds \(\varphi \), the weights of constraints are decreased to preserve population diversity. Conversely, if the RFS is below the threshold, the weights of constraints are increased to guide the population toward convergence to the CPF.

  4. 4)

    In the repush stage, \({\text {Pop}}_2\) undergoes knowledge transfer with \({\text {Pop}}_1\), when moving toward the UPF once more, thereby aiding in the expansion of the unexplored CPF for \({\text {Pop}}_1\).

3. The effect of using tri-stage: The proposed TSRSM has three stages: the push stage, the pull stage, and the repush stage. Therefore, Three variants are designed for the algorithm. First, we discuss the different order between the pull stage and the repush stage. When the repush stage is before the pull stage, the algorithm is named TSRSM-PRP. Then, another two variants of TSRSM are considered: TSRSM-PR, which excludes the pull stage, and TSRSM-PP, which excludes the repush stage.

The detailed IGD+ and HV results of those three algorithms are reported in Table 10 and Table 11. TSRSM outperforms TSRSM-PR, TSRSM-PP, and TSRSM-PRP in terms of IGD+ and HV results. TSRSM-PRP exhibits poorer performance compared to TSRSM-PR, TSRSM-PP, and TSRSM, indicating that the pull stage should precede the repush stage. TSRSM-PP has better performance on LIRCMOP than other methods. This implies that the \(\epsilon \)-constrained technique in the pull stage is more effective in passing through infeasible regions and solving problems with small feasible regions. As presented in Fig. 5h, TSRSM-PR obtains poor diversity on the CPF, while TSRSM-PP gets a well-distributed set of solutions in Fig. 5g. However, the \(\epsilon \)-constrained technique is susceptible to falling into local optimum easily. Figure 6 illustrates the performance of TSRSM-PP and TSRSM on MW13. MW13 exhibits the characteristics of slender and narrow feasible regions, and is prone to local optimum. TSRSM-PP displays a less favorable distribution on the CPF when compared to TSRSM. This discrepancy arises from the fact that TSRSM has the capability to switch to the repush stage, which increases the likelihood of escaping the local optimum. Therefore, the tri-stage of TSRSM is the most effective.

Investigation into the main strategies

In this section, four variants are designed to verify the effectiveness of proposed strategies in TSRSM. The first variant (TSRSM-V1) is to verify the effectiveness of using \({\text {Pop}}_1\) to calculate sk and sd. Therefore, the first variant employs \({\text {Pop}}_2\) to calculate \(sc_k\) and \(sd_k\). The second variant (TSRSM-V2) employs 20 generations to reward the two preceding stages, while the third variant (TSRSM-V3) utilizes 200 generations, thus confirming the effectiveness of varying the number of rewarded generations in RSM. The fourth variant (TSRSM-V4) uses offspring \({\text {Off}}_2\) to attend the \(\text {CP}_1\), not \({\text {Pop}}_2\), which is used to verify the effectiveness of the cooperation of two populations on the pull stage.

Table 12 shows the performance of TSRSM and four variants on LIRCMOP problems. TSRSM is significantly better than TSRSM-V1, TSRSM-V2, TSRSM-V3, and TSRSM-V4 on 4, 9, 8, and 11 problems, respectively. Moreover, TSRSM obtains 7 best average values on LIRCMOP problems. Hence, the effectiveness of the proposed strategies in TSRSM can be verified.

Parameter analysis of TSRSM

The proposed TSRSM contains three parameters \(\lambda \), \(gr_1\) and \(gr_2\) to adjust the RSM. The setting of \(gr_1\) and \(gr_2\) is dependent on \(\lambda \). In other words, the values of \(gr_1\) and \(gr_2\) vary when \(\lambda \) takes different values. Due to space limitations, we only present the effects of \(gr_1\) and \(gr_2\) when \(\lambda =1\)e–2, despite conducting numerous experiments. Here are the experimental details.

  1. 1.

    TSRSM3: \(gr_1=0\) and \(gr_2=200\).

  2. 2.

    TSRSM4: \(gr_1=40\) and \(gr_2=200\).

  3. 3.

    TSRSM5: \(gr_1=60\) and \(gr_2=200\).

  4. 4.

    TSRSM6: \(gr_1=80\) and \(gr_2=200\).

  5. 5.

    TSRSM7: \(gr_1=20\) and \(gr_2=0\).

  6. 6.

    TSRSM8: \(gr_1=20\) and \(gr_2=400\).

  7. 7.

    TSRSM9: \(gr_1=20\) and \(gr_2=600\).

  8. 8.

    TSRSM10: \(gr_1=20\) and \(gr_2=800\).

The comparison results in terms of IGD+ and HV are presented in Tables 13, 14, 15 and 16. When \(gr_1=0\) or \(gr_2=0\), TSRSM obtains very poor results, indicating that both the push stage and the pull stage are crucial for TSRSM. TSRSM4 (with \(gr_1=40\) and \(gr_2=200\)) and TSRSM5 (with \(gr_1=60\) and \(gr_2=200\)) exhibit comparable performance to TSRSM (with \(gr_1=20\) and \(gr_2=200\)), suggesting that varying \(gr_1\) from 20 to 60 has negligible impact on the performance of TSRSM. As \(gr_1\) and \(gr_2\) increase, the number of generations left in TSRSM becomes smaller, which means that TSRSM has less chance to make adjustments. In other words, the larger \(gr_1\) and \(gr_2\), the worse performance. Moreover, among the 9 variants of TSRSM with different values for \(gr_1\) and \(gr_2\), TSRSM with \(gr_1=20\) and \(gr_2=200\) demonstrates superior performance. Therefore, \(gr_1=20\) and \(gr_2=200\) are set in this paper.

Conclusions and future work

This paper proposes a tri-stage with reward-switching mechanism framework named TSRSM for CMOPs. The tri-stage consists of the push, pull, and repush stage. Each stage consists of two coevolutionary populations, namely \({\text {Pop}}_1\) and \({\text {Pop}}_2\). Among the two populations, \({\text {Pop}}_1\) is dedicated to converging toward the CPF over three stages, whereas \({\text {Pop}}_2\) is designed to go back and forth between the UPF and CPF. Moreover, RSM is applied to determine when to switch stages according to the maximum rate of change between the convergence and diversity in the population. The experimental results on three benchmark test sets and 30 real-world CMOPs demonstrate that TSRSM outperforms 9 state-of-the-art peer CMOEAs. Furthermore, TSRSM demonstrates competitive performance in addressing problems with complex infeasible regions and small discrete feasible regions, such as the synchronous optimal pulse-width modulation of 3-level inverters problem.

However, the current TSRSM still has certain limitations, which can be addressed through the following potential approaches.

  1. 1.

    TSRSM still performs poorly on some test functions such as LIRCMOP5-6, and LIRCMOP10-11. TSRSM-PP and TSRSM4 with a large proportion of the pull stage perform better in these test functions, which means that it is necessary to design a better RSM for improving the accuracy of the switching. The integration of RSM with machine learning techniques, which can learn the characteristics of various stages and problems, shows potential in determining optimal switching strategies.

  2. 2.

    TSRSM requires setting three parameters that significantly impact the algorithm’s performance. To simplify the application and enhance efficiency, advanced techniques can be employed to minimize the number of required parameters. Furthermore, utilizing the concept of iterative learning control may offer a feasible approach to dynamically tune the parameters of TSRSM [51, 52].

The following research directions are suggested for future study.

  1. 1.

    Further research is warranted for fuzzy-based TSRSM to address the challenges presented by real-world multiobjective optimization problems with uncertain parameters and uncertain semantic representations [53,54,55].

  2. 2.

    It is worth considering the exploration of adaptive selection of different evolutionary operators in TSRSM.