OPEN ACCESS
This paper solves the jobshop scheduling problem (JSP) considering job transport, with the aim to minimize the maximum makespan, tardiness, and energy consumption. In the first stage, the improved fast elitist nondominated sorting genetic algorithm II (INSGAII) was combined with N5 neighborhood structure and the local search strategy of nondominant relationship to generate new neighborhood solutions by exchanging the operations on the key paths. In the second stage, the ant colony algorithm based on reinforcement learning (RLACA) was designed to optimize the job transport task, abstract the task into polar coordinates, and further optimizes the task. The proposed twostage algorithm was tested on small, medium, and largescale examples. The results show that our algorithm is superior to other algorithms in solving similar problems.
Jobshop scheduling problem (JSP), multiple objectives, job transport; twostage optimization, improved fast elitist nondominated sorting genetic algorithm II (INSGAII)
Jobshop scheduling, a key link in the intelligent manufacturing system, involves multiple manufacturing resources, such as processing and transport resources. Facing the depletion of global nonrenewable energies and the deterioration of the natural environment, it is of great significance to explore the jobshop scheduling problem (JSP) that considers energy consumption [14].
Heuristic methods have been widely used to solve multiobjective scheduling problems [5]. One of the most successful algorithms in solving such problems is the genetic algorithm (GA) [68]. So far, many improved versions of GA, namely, nondominated sorting genetic algorithm (NSGA) and fast elitist NSGA (NSGAII), have been developed, and effectively applied to multiobjective JSPs [913].
Lan et al. [14] introduced an improved GA to solve the complex multiobjective problem of flexible jobshop scheduling, and demonstrated the effectiveness of the algorithm through experiments. Considering the sequence of task points and its impact on trajectory distance, Baizid et al. [15] proposed a GAbased method to optimize the sequence of visiting task points, and experimentally proved the validity of the method. Baizid et al. [16] put forward a novel scheduling strategy to determine the shortest path between continuous operating points of jobshop transport devices, which indirectly saves resources. Desirable experimental results have also been achieved by solving multiobjective optimization problems with algorithms like the gray wolf optimizer [1720]
To sum up, most studies have only obtained the Pareto solutions of the problem. Few have optimized the job transport task based on these solutions. Drawing on the features of specific scheduling problem, this paper improves the NSGAII and combines it with Qlearning optimization to solve a multiobjective JSP. In the first stage, three objectives, namely, makespan, tardiness, and energy consumption, were optimized to obtain the corresponding Pareto optimal solutions. In the second stage, the job transport task was further optimized to minimize the number of transport robots and their transport paths. Experimental results demonstrate the feasibility and effectiveness of the proposed algorithm. The research results provide theoretical and technical support for the JSPs in similar scenarios.
Our research focuses on a JSP with multiple transport robots. Let J={J_{1}, J_{2},…, J_{n}}be the set of n jobs, and M={M_{1}, M_{2},…, M_{m}} be the set of m machines. Each job has m operations that must be completed on the m machines. In addition, the operation of job i on machine j is denoted as O_{i,j}; the start time, processing time, and completion time of job i on machine j are denoted as S_{i,j}, p_{i,j}, and C_{i,j}, respectively. After the completion of operation O_{i,j}, job i will be transported by the robot from machine j to machine j+1, kicking off the operation O_{i,j}_{+1}. The objectives of the jobshop scheduling include makespan, tardiness, and energy consumption of the jobshop. The production process must satisfy the following hypotheses:
(a) All machines in the jobshop can start working at zero hour;
(b) Unless the release time of job is otherwise specified, the processing of each job can start at zero hour;
(c) Every job has a fixed operation sequence;
(d) Each machine can only process a job at a time, and each job can only be processed by one machine at a time;
(e) No preemption is allowed, and no machine fails during job processing;
(f) There are enough transport robots;
(g) Each robot can only transport one type of materials at a time, and does not fail throughout the process;
(h) There is a buffer zone between stations for temporary storage of raw materials and finished products;
(i) Each robot can stop at any station to provide transport service, and the loading and unloading times are so small as to be negligible;
(j) The physical distance between stations is known in advance.
Figure 1 (a) describes the simple JSP without considering transport robots, and Figure 1 (b) presents the Gantt charge of centralized scheduling for machines and robots. In Figure 1 (a), operation 1 of J_{1} is processed on M_{2}, and operation 2 of that job is processed on M_{1}; there is obviously no time interval between the two operations. Figure 1 (b) expresses the sequence of jobs, as well as the transport time $T_{i}^{k, k+1}$ of each job between every two stations. For example, $T_{1}^{12}$ means the transport time of job 1 from station 1 to station 2; $T_{1}^{23}$ means the transport time of job 1 from station 2 to station 3. Although the operation sequence is the same, J_{1} needs to be transported by robot between M_{2} and M_{1}, which respectively process jobs 1 and 2. This leads to a transport time between the two operations, which depends on the physical distance between the stations and the efficiency of the robot. In total, five transport tasks $T_{1}^{12}, T_{1}^{23}, T_{2}^{12}, T_{3}^{12},$ and $T_{3}^{23}$ need to be completed. The transport time of each task hinges on the processing time of each operation, the processing capacity of each machine, the physical distance between stations, and the operating speed of the robot.
In addition, the number of robots can be optimized based on the urgency of the processing task, and the cost and energy consumption of robots. Each robot belongs to either load state or noload state. The load state means the robot carries materials from the current station to the next station; the noload state means the robot moves from current station to the next station, or waits at the current station, without carrying any materials (Figure 1).
(a) (b)
Figure 1. The Gantt charts of simple scheduling (a) and centralized scheduling (b)
Under the joint constraint of machines and robots, the transport time of jobs is uncertain. But there must exist an optimal transport time. Therefore, the scheduling of our problem aims to select the machines and robots befit the jobs, such that the job processing can start as early as possible and meet the requirement on processing time; Meanwhile, the noload state of robots needs to be minimized to reduce the number of robots in operation. Further, the processing sequence of jobs and transport paths of robots should be rationalized to satisfy the job transport demand of the jobshop at the lowest cost.
3.1 Modeling of makespan
For the JSP considering job transport time, the first objective is to minimize the maximum makespan of the jobs, the second is to minimize the tardiness of the jobs, and the third is to minimize the energy consumption of the jobshop. The scheduling plan outputted by the algorithm must satisfy the relevant constraints. The first objective can be modeled as:
$C_{m a x}=\min \left(\max C_{i, j}\right), 1 \leq i \leq n, 1 \leq j \leq m$ (1)
s.t. $C_{i, j}=S_{i, j}+p_{i, j}$ (2)
$C_{i, j} \geq 0$ (3)
$S_{i j} \geq \max \left(C_{i, g}+t_{g j}\right)$ (4)
$C_{i, j}+W \cdot\left(1\alpha_{i g j}\right) \geq C_{i, g}+t_{g, j}$$+p_{i, j}, 1 \leq i \leq n, 1 \leq j, g \leq m$ (5)
$C_{i^{\prime}, j}+W \cdot\left(1\beta_{i i^{\prime} j}\right) \geq C_{i, j}+p_{i^{\prime}, j}, 1 \leq i, i^{\prime} \leq n, 1 \leq j \leq m$ (6)
$\alpha_{i g j}=\left\{\begin{array}{l}1, \quad \text { if job i is processed on machine g before job j} \\ 0, \quad \text { otherwise }\end{array}\right.$ (7)
If job i is processed on machine g before job j, otherwise
$\beta_{i i^{\prime} j}=\left\{\begin{array}{lc}1, & \begin{array}{c}\text {if job i is processed on machine j before job }i^{\prime} \end{array} \\ 0, & \text { otherwise }\end{array}\right.$ (8)
Formula (1) is the first optimization objective, i.e., the minimization of the maximum makespan of the production task; Formulas (2) and (3) are the completion time of job i and its constraint, respectively; Formula (4) is the start time for the processing of job i, where t_{g,j} is the transport time of the job from machine g to machine j; Formula (5) is the technological constraint, i.e., the operations of the same job are processed in different sequences; Formula (6) is the machine constraint, i.e., each machine can only process one job at a time, where W is a sufficiently large positive number; Formulas (7) and (8) are indicator variables.
3.2 Modeling of tardiness
Based on the model of the first objective, the tardiness of the production task can be modelled as:
$D_{\text {total }}=\sum_{i=1}^{n} \max \left(0, C_{i}C_{i}^{\prime}\right)$ (9)
where, C_{i} and $C_{i}^{\prime}$ are the completion time and delivery date of job i, respectively.
3.3 Modeling of energy consumption
The energy consumption of machines was divided into a loadindependent component E_{1} and a loaddependent component E_{2}. Let $P_{j}^{\text {constant}}(t)$ be the loadindependent comprehensive power of machines, e.g., the energy consumed by the activation of relevant parts during the startup and shutdown of machines. Then, the loadindependent component E_{1} can be expressed as:
$E_{1}=\sum_{j=1}^{m} \int_{0}^{t_{\text {start}}} C^{\text {constant}} \cdot P_{j}^{\text {constant}}(t) d t$ (10)
where, t_{start} is the startup time of machines; C^{constant} is a constant of energy consumption.
The loaddependent component E_{2} of machines mainly manifests as the resistance to elastic deformation, the resistance to plastic deformation, and the friction against the tool surface of the material during the cutting process. In actual applications, the resultant cutting force is generally divided into three mutually perpendicular components: cutting force, feed force, and thrust force. Let $P_{i j}^{c u t}$ be the comprehensive power of machines in the cutting process. Then, the loaddependent component E_{2} can be expressed as:
$E_{2}=\sum_{j=1}^{m} \sum_{i}^{n} C^{c u t} \cdot P_{i j}^{c u t} \cdot p_{i, j}$ (11)
where, C^{cut} is a constant of energy consumption in cutting; p_{i,j} is the processing time of job i on machine j.
When a machine waits for the next job, an idling energy consumption E_{3} will occur. Let $P_{j}^{i d l e}$ be the power of idling energy consumption. Then, the idling energy consumption E_{3} can be expressed as:
$E_{3}=\sum_{j=1}^{m} C^{i d l e} \cdot t_{j}^{i d l e} \cdot P_{j}^{i d l e}$ (12)
where, C^{idle} is a constant of energy consumption in idling; $t_{j}^{\text {idle}}$ is the idling time of machine j.
The transport energy consumption E_{4} of a robot refers to the energy consumed by the robot to transport a job between machines. Let $P_{j j^{\prime}}^{\text {transport}}$ be the power of transport energy consumption. Then, the transport energy consumption E_{4} can be expressed as:
$E_{4}=C^{t r a n s p o r t} \cdot P_{j j^{\prime}}^{t r a n s p o r t} \cdot \sum t_{j j^{\prime}}, j, j^{\prime}=1,2, \cdots, m$ (13)
where, C^{transport} is a constant of energy consumption in transport; $t_{j j^{\prime}}$ is the transport time of the robot from machine j to machine j’.
Through the above analysis, the optimization objective of total energy consumption of the jobshop can be established as:
$E=\min \left\{E_{1}+E_{2}+E_{3}+E_{4}\right\}$ (14)
The three aspects of multiobjective optimization were combined into one objective function f={f_{1},f_{2},f_{3}} that contains three subobjective functions. This function can be solved by handling each subobjective function in turn:
The first objective: minimizing the maximum makespan
$f_{1}=\min \left(C_{\max }\right)$ (15)
The second objective: minimizing the tardiness
$f_{2}=\min \left(D_{\text {total}}\right)$ (16)
The third objective: minimizing the energy consumption
$f_{3}=\min (E)$ (17)
The improved fast elitist nondominated sorting genetic algorithm II (INSGAII) was employed to solve the multiobjective JSP.
4.1 Coding and decoding
By operation sequence coding, all the operations in the production task were coded with natural numbers starting from 1. For each job, the serial number of each operation and that of the corresponding machine were included in the code. Take a JSP with 3 machines and 3 jobs as an example to explain the coding method. Table 1 presents the operations in the problem and their processing sequence.
Table 1. The information of the processing task
J 
Processing time 
Operation sequence 
Serial number of operations 

M_{1} 
M_{2} 
M_{3} 
M_{1} 
M_{2} 
M_{3} 
M_{1} 
M_{2} 
M_{3} 

J_{1} 
3 
4 
2 
3 
2 
1 
(3) 
(2) 
(1) 
J_{2} 
1 
2 
1 
2 
1 
3 
(5) 
(4) 
(6) 
J_{3} 
1 
2 
1 
1 
3 
2 
(7) 
(9) 
(8) 
Randomly generate chromosome 1 according to the process number. For the three processes of the workpiece, according to the chromosomal order, the processing order is Process 2 Process 1 Process 3, the processing order of the workpiece is Process 1 Process 3 Process 2, and the processing order of the workpiece is Process 1 Process 3 Process 2. Partially sort each workpiece to obtain legal chromosome 2, as shown in Figure 2.
Chromosome 1 is randomly generated based on the serial number of the operations. For job J_{1}, the three operations are sorted by the chromosome as: operation 2 → operation 1 → operation 3; For job J_{2}, the operations are sorted as: operation 1 → operation 3 → operation 2; For job J_{3}, the operations are sorted as: operation 1 → operation 3 → operation 2.
Figure 2. The local sorting of chromosomes
According to the start time and completion time of each operation, the value of the objective function was solved. The decoding process can be specified as follows:
Step 1. Identify the operation at the first gene of the chromosome, find the corresponding job in the operation list, process the job on the corresponding machine, and record the start time and completion time of the operation.
Step 2. Identify the operation at the next gene, find the corresponding job in the operation list, arrange a suitable machine to process the job according to the operation sequence of the job and the idle state of machines, and record the start time and completion time of the operation.
During the machine arrangement, scan all the idle periods of the corresponding machine, and judge if:
(a) The length of idle period [Idle_{start}, Idle_{end}] is greater than the processing time of the operation p_{i,j};
(b) The start time point of the idle period Idle_{start} is earlier than the completion time point of the previous operation O_{i}_{1,j}.
If both conditions hold, insert operation O_{i,j} into idle period [Idle_{start}, Idle_{end}]; otherwise, insert the operation behind the current operation on the machine.
Step 3. Repeat (b) until the operations at all genes of the chromosomes have been inserted.
Step 4. Calculate the makespan C_{max} of the scheduling plan.
4.2 Selection, crossover, and mutation
Our coding method has the natural advantage of retaining excellent chromosome fragments. The excellent genes can be passed down to offspring chromosomes. Then, the crossover operator can be executed to realize global search. However, the offspring chromosomes are not always feasible scheduling plans. Sometimes, the adjacent chromosomes cannot fully demonstrate the operation features on each machine.
(1) Selection
Individuals are chosen by tournament selection. The genes with low nondominated levels are selected first, while the elite individuals are retained; if two individuals are of the same level, the one with the larger crowding distance is selected. In the parent population, two individuals are randomly selected, and the one with the higher fitness will be retained for the next generation.
(2) Crossover
The precedence preserving orderbased crossover (POX) operator was selected for crossover, which can effectively inherit the excellent genes of the parent chromosomes, and achieve better scheduling results under the same conditions. For an m$\times$n production task, the parent chromosomes of operation codes are denoted as P_{1} and P_{2}, and the offspring chromosomes obtained through POX are denoted as C_{1} and C_{2}. Then, the POX procedure can be summarized as follows (Figure 3):
Step 1. Randomly divide each chromosome {1, 2, …, m$\times$n} into two nonempty chromosome fragments S_{1} and S_{2}.
Step 2. Copy the operations of P_{1} in fragment S_{1} to C_{1}, and those of P_{2} in fragment S_{1} to C_{2}, while keeping the position of each operation unchanged.
Step 3. Copy the operations of P_{2} in fragment S_{2} to C_{1}, and those of P_{1} in fragment S_{2} to C_{2}, while keeping the position of each operation unchanged.
Step 4. Obtain the offspring chromosomes through crossover.
Figure 3. The POX crossover
(3) Mutation
To diversify the population, mutation is implemented by locally disturbing the chromosomes. The common mutation operators include insertion, inversion, and exchange. Different mutation probabilities bring varied benefits. If the mutation probability is low, the excellent gene fragments in chromosomes can be preserved well, but the optimal solution is very likely to be missed due to the weak search ability; if the probability is high, the algorithm will degenerate into random search, which affects the stability and convergence. This paper implements mutation with a strategy based on neighborhood search, and improves offspring chromosomes through local search. As shown in Figure 4, the mutation process can be detailed as follows:
Figure 4. The mutation operation
Step 1. Set the loop variable x=0, and the maximum number of mutations X.
Step 2. Randomly generate the judgment factor μ, and judge the relationship between μ and the mutation probability P_{m}. If μ≤P_{m}, go to the next step; otherwise, terminate the mutation process.
Step 3. Select σ operations from a chromosome, and generate all the neighborhoods of these operations.
Step 4. Calculate the fitness values of all neighborhoods, and identify the one with the highest fitness.
Step 5. Increase x by 1.
Step 6. Terminate the mutation process if x>X.
(4) Improved N5 neighborhood search strategy
Based on the classic N5 neighborhood search strategy [21], the N5 neighborhood structure was improved, and coupled with the nondominated relationship into a new local search strategy:
Step 1. Calculate the total number N_{p} of individuals in the population, randomly select an individual P_{i} from the population, and initialize the value of P_{i} as 1.
Step 2. Move the N5 neighborhood of individual i to generate the corresponding neighborhood individual C_{i}, and perform decoding on the new individual.
Step 3. Calculate the objective values of the scheduling plan for individual i; if the objective values of C_{i} are better than those of P_{i}, then C_{i} dominates P_{i}, i.e., P_{i}<C_{i}, and replace P_{i} with C_{i}; otherwise, preserve P_{i}.
Step 4. Increase P_{i} by 1; if P_{i}≤N_{p}, execute Step 2; otherwise, terminate the iterative process.
Despite the consideration of energy consumption in transport, the scheduling plan obtained by INSGAII only takes account of the theoretical transport energy consumption of jobs in the solving process. This section further optimizes the completion of the transport tasks in the scheduling plan.
5.1 Position scan
During the position scan, the demand points were grouped before path selection. Each demand point was expressed as polar coordinates. Taking a random demand point as the starting point, the zoning was performed in the clockwise or counterclockwise direction under the constraint of robot capacity. Then, the exchange method was implemented to sort the demand points, thereby minimizing the number and energy consumption of robots.
5.2 Path selection
The ant colony algorithm (ACA) was adopted for path selection. During the selection, each ant selects the next node at a certain probability. That is, the transition node with the highest probability is chosen by the ant:
$p_{i j}^{k}=\left\{\begin{array}{ll}\frac{\tau_{i j}^{\alpha}(\mathrm{t}) * \eta_{i j}^{\beta}(\mathrm{t})}{\sum_{s \in a l l o v_{k}} \tau_{i j}^{\alpha}(\mathrm{t}) \eta_{i j}^{\beta}(\mathrm{t})} & s \in \text {allow}_{k} \\ 0 & s \notin \text { allow }_{k}\end{array}\right.$ (18)
where, $p_{i j}^{k}$ is the transition probability of ant k from node i to node j; α is the heuristic information (the greater the α value, the more likely for the ant to select the already chosen paths, and the less likely for the ant to randomly explore other paths; the smaller the α value, the smaller the search scope, and the algorithm is prone to falling into the local optimum); β is the expected heuristic factor (the greater the β value, the more likely for the ant to select the local optimal path, the faster the convergence, and the greater the proneness to falling into the local optimum); $\tau_{i j}^{\alpha}(t)$ is the pheromone from node i to node j; $\eta_{i j}^{\beta}(t)$ is the heuristic factor from node i to node j.
To prevent ants from repeatedly selecting nodes that have been visited, the visited nodes were recorded in a tabu list. After time t, all ants completed a traversal search. Then, the path length covered by each ant was calculated, and the minimum was saved. To balance convergence speed with solution quality, threshold θ_{0}$\in$(0,1) was introduced to the node selection strategy. Before an ant makes a selection, a threshold $\theta^{\prime}$ was randomly generated. If $\theta^{\prime} \leq \theta_{0}$, the ant will choose the node with the largest $\tau_{i j}^{\alpha}(t) * \eta_{i j}^{\beta}(t)$; otherwise, the ant will choose another node.
The pheromone can be updated by:
$\left\{\begin{array}{l}\tau_{i j}(t+1)=(1\rho) \tau_{i j}(t)+\tau_{i j} \\ \tau_{i j}=Q^{\prime} / \text { fitness }(i)\end{array}\right.$ (19)
where, ρ is the pheromone volatilization factor; 1ρ is the residual pheromone factor (if ρ value is small, there will be too many residual pheromones on the selected path; in this case, the illegal paths will be searched continuously, which impedes algorithm convergence; If ρ value is large, the invalid paths can be excluded from the search, but the legal paths are not necessarily covered in the search space, which will affect the quality of the optimal solution.); $Q^{\prime}$ is a constant. After all ants completed a traversal search, the tabu list was emptied, and all ants would return to the initial node, and perform the next round of search.
5.3 Obstacle avoidance
The path optimization and dynamic obstacle avoidance were realized through reinforcement learning (RL), which autonomously interacts with the environment, and obtains the mapping function from state to action by maximizing the reward function. The Qlearning algorithm is a modelfree timeseries difference algorithm for estimating the Q value. The core idea is to estimate the action of the Q value based on the increment of the reward value, and select the maximum action value of the next state iteratively:
$Q_{k+1}\left(s_{t}, a_{t}\right) \leftarrow Q_{k}\left(s_{t}, a_{t}\right)$$+\alpha\left(r_{t}+\gamma \max _{\alpha} Q_{k}\left(s_{t+1}, a\right)Q_{k}\left(s_{t}, a_{t}\right)\right)$ (20)
As the state transfers from s_{t} to s_{t}_{+1}, the corresponding reward value r_{t} is received. The algorithm selects the maximum action value through iterations, and eventually converge to the optimal strategy. Taking the opposite of the path length between nodes as the reward, the path selection by greedy strategy can be expressed as:
$a_{k}=\arg \max _{a \in A\left(s_{k}\right)}\left\{r_{k+1}^{a}+V\left(S_{k+1}^{a}\right)\right\}$ (21)
6.1 Experimental setup
Table 2. The main parameters of the INSGAII
Name 
Meaning 
Value 
N_{popu} 
Initial population 
300 
N_{pare} 
Size of Pareto solution set 
40 
N_{iter} 
Number of iterations 
400 
p_{sele} 
Selection probability 
0.02 
p_{cros} 
Crossover probability 
0.90 
p_{muta} 
Mutation probability 
0.10 
The main parameters of the INSGAII are presented in Table 2. The main cost parameters of the transport task are given in Table 3. In addition, the other parameters were configured as: the pheromone importance factor=1; constant Q=1; the importance of heuristic function=5; pheromone volatilization factor=0.1; crossover probability=0.85; genetic gap=0.8; mutation probability=0.1, learning rate=0.1; discount factor=0.9.
Table 3. The main cost parameters of the transport task
Type 
Name 
Value 
Weight 
Basic cost 
Basic cost of robot assignment 
300 
0.3 
Transport cost 
Unit distance cost of robot 
5 
0.5 
Time window penalty costs 
Penalty coefficient for early arrival 
0.002 
0.2 
Penalty coefficient for late arrival 
0.04 

Operating speed of robot 
30 

Maximum capacity of robot 
4 
6.2 Comparative analysis
(a) Multiobjective scheduling experiments
Tables 4, 5, and 6 record the processing sequence, processing time, and energy consumption power of each job on each machine, respectively.
First, the INSGAII and NSGAII were separately adopted to find the reasonable scheduling plan for an 8´8 JSP, with the aim to minimize the maximum makespan and energy consumption. Each algorithm was run 40 times independently. The optimal results of the two algorithms were compared. As shown in Table 7, most results of NSGAII were dominated by those of INSGAII. The results of INSGAII had higher nondominance level. Thus, INSGAII clearly outperformed NSGAII.
Table 4. The processing sequence

Processing sequence 


M_{1} 
M_{2} 
M_{3} 
M_{4} 
M_{5} 
M_{6} 
M_{7} 
M_{8} 
J_{1} 
2 
5 
4 
8 
3 
7 
6 
1 
J_{2} 
8 
5 
7 
2 
1 
4 
6 
3 
J_{3} 
7 
3 
5 
8 
2 
4 
1 
6 
J_{4} 
4 
8 
5 
3 
2 
1 
7 
6 
J_{5} 
2 
4 
7 
6 
1 
5 
8 
3 
J_{6} 
2 
4 
3 
5 
1 
8 
7 
6 
J_{7} 
1 
5 
4 
3 
8 
2 
6 
7 
J_{8} 
6 
5 
7 
2 
4 
3 
1 
8 
Table 5. The processing time

Processing time 
Release time 
Delivery date 


M_{1} 
M_{2} 
M_{3} 
M_{4} 
M_{5} 
M_{6} 
M_{7} 
M_{8} 
0 
150 
J_{1} 
17 
18 
11 
14 
18 
18 
10 
15 
0 
120 
J_{2} 
11 
8 
21 
18 
12 
8 
19 
25 
30 
190 
J_{3} 
17 
23 
21 
9 
18 
20 
9 
9 
0 
120 
J_{4} 
13 
20 
11 
19 
10 
23 
15 
19 
0 
180 
J_{5} 
15 
18 
11 
8 
12 
10 
13 
15 
0 
180 
J_{6} 
17 
8 
25 
18 
10 
8 
9 
7 
0 
 
J_{7} 
7 
18 
9 
10 
14 
18 
12 
13 
10 
190 
J_{8} 
11 
8 
25 
18 
12 
9 
22 
13 
0 
150 
Table 6. The power of energy consumption

Machine 

Energy consumption 
M_{1} 
M_{2} 
M_{3} 
M_{4} 
M_{5} 
M_{6} 
M_{7} 
M_{8} 
Cutting energy consumption 
22 
15 
9 
10 
6 
7.5 
8 
12 
Idling energy consumption 
3.2 
2.3 
1.8 
2.1 
1.9 
4.4 
3.4 
4.0 
Other energy consumption 
1.2 
1.3 
0.9 
0.7 
1.2 
1.4 
1.6 
1.4 
Transport energy consumption 
3 
3 
3 
3 
3 
3 
3 
3 
Table 7. Some solutions of the two algorithms
INAGAII 
NAGAII 

Makespan (C_{max}) 
Energy consumption (E) 
Tardiness (D_{total}) 
Makespan (C_{max}) 
Energy consumption (E) 
Tardiness (D_{total}) 
159 
405 
84 
163 
409 
89 
160 
400 
86 
164 
405 
91 
161 
394.9 
87 
166 
404 
92 
162 
390 
87.5 
166 
403 
93 
163 
385 
88 
167 
400 
93 
164 
383 
88.5 
168 
399 
94 
165 
380 
89 
168 
397 
94 
166 
372 
90 
169 
390 
95 
167 
370 
90 
170 
388 
96 
168 
365 
91 
171 
386 
97 
Figure 5. The distribution of Pareto solutions
The distribution of Pareto solutions of the two algorithms are compared in Figure 5, where the three dimensions stand for energy consumption, makespan, and tardiness, respectively. Obviously, the Pareto solutions of INSGAII were nearly all on the upper right of those of NSGAII.
Suppose the theoretical optimal solution is point (155, 75, 320). The spatial distance between the solutions of each algorithm and the theoretical value can be solved by:
Distance $_{\text {pateto}}=$$\frac{1}{N} \sum_{i=1}^{N} \sqrt[2]{\left(C_{\max }^{i}155\right)^{2}+\left(D_{\text {total}}^{i}75\right)^{2}+\left(E^{i}320\right)^{2}}$ (22)
For comparability, the three terms under the radical in (22) were normalized by:
$x^{\prime}=\frac{xx_{\min }}{x_{\max }x_{\min }}$ (23)
It was calculated that the Distance_{pateto} of INSGAII stood at 3.333, and that of NSGAII at 3.357. Obliviously, the result of INSGAII is closer to the theoretical optimal solution.
Next, the mean maximum makespan and mean number of iterations of each algorithm in 40 independent runs were calculated to compare their convergence speeds. As shown in Figure 6, INSGAII converged in an average of 200 iterations, while NSGAII converged in an average of 216 iterations. Thus, INSGAII converges faster than NSGAII.
Figure 6. The comparison of convergene curves
Through the experiments on the smallscale problem of 8´8, the proposed INSGAII outperforms NSGAII in solution quality and convergence speed. To further validate its performance, our algorithm was compared with NSGAII and SPEA2 (Strength Pareto Evolutionary Algorithm 2) [22] on medium and largescale problems. Four numbers of jobs were selected: 20, 50, 80, and 100; Each number of jobs were processed on 4, 6, 8, and 10 machines, respectively. Each algorithm was run independently for 40 times on each problem. The makespan of the processing task was generated from a uniform distribution U(30,100). The total makespan of jobs p can be expressed as:
$p=\sum_{i=1}^{n} p_{i}$ (24)
The delivery date of jobs can be calculated by:
$D_{i}=p \cdot(1\xi)+(p / 1.2+p \cdot(1\xi)) /(n1)$ (25)
where, ξ is a set of values from 0.2 to 1.0; the step size is 0.2.
Table 8 compares the gradient descent (GD) of the three algorithms on the 16 medium and largescale problems. It can be seen that INSGAII outshined the other algorithms on most examples: SPEA2 achieved the best results on 1 mediumscale example; NSGAII achieved the best results on 4 examples; INSGAII achieved the best results on 11 examples. This further verifies the excellent convergence ability of INSGAII. Moreover, the best results on all four largescale examples belonged to INSGAII, an evidence to the superiority of our algorithm in solving largescale JSPs.
Table 8. The comparison of GD indices
Problem 
SPEA2 
NSGAII 
INSGAII 

Mean 
Standard deviation 
Mean 
Standard deviation 
Mean 
Standard deviation 

20´4 
3.96E03 
7.16E03 
8.06E03 
7.47E03 
5.48E03 
3.75E03 
20´6 
9.33E03 
1.39E03 
3.67E03 
8.10E03 
1.20E03 
2.03E03 
20´8 
8.77E03 
3.22E03 
2.31E03 
8.12E03 
1.05E02 
1.84E03 
20´10 
2.23E03 
8.36E03 
3.49E03 
3.09E03 
1.17E03 
2.52E03 
50´4 
5.61E02 
1.07E02 
3.59E02 
3.67E02 
2.62E02 
1.30E02 
50´6 
6.70E03 
9.59E03 
2.47E03 
2.66E03 
9.10E03 
3.91E03 
50´8 
6.71E03 
1.69E03 
9.47E03 
7.70E03 
4.09E03 
4.26E03 
50´10 
1.92E03 
7.89E03 
1.36E03 
9.45E03 
4.74E03 
6.92E03 
80´4 
5.82E03 
1.12E03 
1.26E03 
9.69E03 
4.30E03 
6.93E03 
80´6 
5.23E03 
2.23E03 
8.39E03 
8.29E03 
2.06E03 
1.28E03 
80´8 
3.83E02 
8.14E02 
3.66E02 
2.57E02 
1.06E02 
4.26E02 
80´10 
9.60E03 
7.59E03 
3.03E03 
1.12E03 
3.43E03 
4.08E03 
100´4 
7.49E02 
2.81E02 
4.77E02 
1.90E02 
3.42E02 
1.08E02 
100´6 
1.33E03 
4.34E03 
1.77E03 
3.72E03 
1.01E03 
5.56E03 
100´8 
9.40E03 
7.36E03 
6.42E03 
6.39E03 
4.24E03 
9.36E03 
100´10 
3.22E03 
2.62E03 
1.82E03 
4.98E03 
1.04E02 
3.35E02 
Accuracy 
1/16 
4/16 
11/16 
Table 9. The comparison of IGD indices
Problem 
SPEA2 
NSGAII 
INSGAII 

Mean 
Standard deviation 
Mean 
Standard deviation 
Mean 
Standard deviation 

20´4 
4.05E03 
3.75E03 
3.91E03 
9.77E03 
1.06E04 
7.43E04 
20´6 
4.32E03 
5.37E03 
7.42E03 
6.69E03 
1.27E03 
1.61E03 
20´8 
8.46E03 
3.19E03 
5.73E03 
3.20E03 
3.43E04 
7.68E04 
20´10 
8.34E03 
4.23E03 
1.08E03 
2.39E03 
8.83E04 
7.77E04 
50´4 
1.08E02 
1.06E02 
4.04E03 
2.82E03 
1.10E04 
2.23E04 
50´6 
8.37E03 
3.82E03 
8.80E03 
2.23E03 
2.16E03 
4.75E03 
50´8 
3.82E03 
6.67E03 
8.76E03 
7.07E03 
8.26E04 
8.26E04 
50´10 
1.01E02 
4.20E03 
8.33E03 
1.28E03 
6.81E04 
9.38E04 
80´4 
4.08E03 
5.51E03 
8.72E03 
7.66E03 
6.04E04 
4.07E04 
80´6 
2.83E03 
8.64E03 
3.14E03 
1.21E03 
9.93E04 
9.36E04 
80´8 
1.01E03 
8.69E03 
6.79E03 
5.01E03 
9.32E04 
1.01E04 
80´10 
8.65E03 
6.96E03 
9.55E03 
8.46E03 
7.46E03 
9.12E03 
100´4 
3.55E03 
4.05E03 
1.05E03 
6.07E03 
7.31E03 
3.11E03 
100´6 
6.31E03 
1.02E02 
4.90E03 
8.03E03 
6.09E04 
1.03E04 
100´8 
2.17E03 
3.29E03 
7.91E03 
2.49E03 
9.69E04 
4.97E04 
100´10 
8.31E03 
6.75E03 
4.54E03 
8.56E03 
2.35E04 
1.10E04 
Accuracy 
0/15 
1/16 
15/16 
Table 10. The comparison of spread indices
Problem 
SPEA2 
NSGAII 
INSGAII 

Mean 
Standard deviation 
Mean 
Standard deviation 
Mean 
Standard deviation 

20´4 
3.14E01 
2.11E01 
1.36E01 
6.47E01 
8.41E01 
5.29E01 
20´6 
6.67E01 
5.66E01 
7.54E01 
1.84E01 
2.16E01 
4.43E01 
20´8 
8.13E01 
9.89E01 
1.05E02 
5.14E01 
4.36E01 
8.40E01 
20´10 
8.48E00 
6.01E00 
8.43E00 
6.47E00 
3.74E00 
1.02E00 
50´4 
9.73E02 
9.58E02 
1.83E02 
1.01E02 
5.68E02 
2.18E02 
50´6 
1.02E02 
3.76E02 
9.63E02 
2.29E02 
2.89E02 
6.58E02 
50´8 
1.02E02 
5.13E02 
1.01E02 
6.63E02 
9.20E02 
7.31E02 
50´10 
4.10E02 
4.46E02 
4.40E02 
8.12E02 
1.95E02 
4.12E02 
80´4 
7.31E02 
4.42E02 
9.25E02 
4.31E02 
3.29E02 
1.54E02 
80´6 
4.47E01 
3.42E01 
1.01E02 
2.95E01 
1.08E02 
2.67E01 
80´8 
5.09E01 
9.49E01 
3.30E01 
9.42E01 
1.01E01 
3.91E01 
80´10 
1.66E02 
6.26E02 
1.08E02 
2.40E02 
1.05E02 
8.26E02 
100´4 
8.65E02 
6.47E02 
7.87E02 
4.36E02 
7.50E02 
5.57E02 
100´6 
8.99E02 
7.16E02 
9.80E02 
1.07E02 
5.19E02 
1.02E02 
100´8 
9.99E00 
1.14E00 
4.85E00 
1.15E00 
3.09E00 
3.62E00 
100´10 
3.60E02 
5.62E02 
8.50E02 
7.77E02 
1.67E02 
1.01E02 
Accuracy 
1/16 
5/16 
10/16 
The incremental gradient descent (IGD) of the three algorithms on the 16 medium and largescale problems are compared in Table 9. It can be seen that INSGAII achieved the best results on 15 examples, and NSGAII achieved the best results on 1 example. Therefore, INSGAII has much better performance than the other algorithms in terms of IGD.
Table 10 compares the spread of the three algorithms on the 16 medium and largescale problems. It can be seen that INSGAII achieved the best results on 10 examples, NSGAII achieved the best results on 5 examples, and SPEA2 achieved the best results on 1 example. This means the solution distribution of INSGAII is better than that of the other algorithms.
(b) Further optimization of job transport task
A total of 20 examples were selected, including five 10´4 examples, five 10´6 examples, five 10´8 examples, five 10´10 examples, five 20´4 examples, five 20´6 examples, five 20´8 examples, and five 20´10 examples. The proposed algorithm was running 30 times on each example. The number of assigned robots, transport energy consumption, and robot utilization rate were calculated. Among them, the robot utilization rate refers to the ratio of the load state time t_{w} of a robot to the total time t_{v} from entering into transport task to the completion of the transport of the last job:
Utilization ratio $=\frac{t_{w}}{t_{v}} \times 100 \%$ (26)
The relevant results are recorded in Table 11. It can be seen that the further optimization of job transport task effectively suppressed the number of assigned robots and the transport energy consumption, and greatly enhanced the robot utilization rate. This fully demonstrates the effectiveness of our algorithm.
Table 11. The comparison of transport results
Problem 
Results of first stage 
Postoptimization results 

Number 
Transport energy consumption 
Utilization ratio 
Number 
Transport energy consumption 
Utilization ratio 

10´4 
10 
50.4 
36.5% 
6.2 
30.5 
90.0% 
10´6 
10 
80.7 
30.7% 
6.1 
48.8 
88.3% 
10´8 
10 
92.6 
27.4% 
7.3 
65.3 
80.9% 
10´10 
10 
107.5 
22.9% 
8 
80.2 
78.2% 
20´4 
20 
78.9 
33.4% 
12 
50.7 
87.3% 
20´6 
20 
90.3 
28.3% 
11 
60.3 
83.2% 
20´8 
20 
123.5 
22.3% 
11 
71.2 
78.1% 
20´10 
20 
189.3 
15.7% 
9 
110.6 
74.5% 
This paper designs a twostage optimization algorithm for multiple objectives of JSP with job transport, including the makespan, tardiness, and energy consumption of the production task, and proves the feasibility and effectiveness of the algorithm through experiments. The research also proves that the combination between RL and GA can greatly improve optimization performance. The future research will further probe into the synergy between the two techniques.
The study was supported by “Key Soft Science Project of "Science and Technology Innovation Action Plan of Shanghai Science and Technology Commission, China (Grant No.: 20692104300); National Natural Science Foundation, China (Grant No.: 71840003); Technology Development Project of University of Shanghai for Science and Technology, China (Grant No.: 2018KJFZ043).”.
[1] Zhang, R., Chiong, R. (2016). Solving the energyefficient job shop scheduling problem: A multiobjective genetic algorithm with enhanced local search for minimizing the total weighted tardiness and total energy consumption. Journal of Cleaner Production, 112: 33613375. https://doi.org/10.1016/j.jclepro.2015.09.097
[2] Ding, J.Y., Song, S., Wu, C. (2016). Carbonefficient scheduling of flow shops by multiobjective optimization. European Journal of Operational Research, 248(3): 758771. https://doi.org/10.1016/j.ejor.2015.05.019
[3] Mansouri, S.A., Aktas, E., Besikci, U. (2016). Green scheduling of a twomachine flowshop: Tradeoff between makespan and energy consumption. European Journal of Operational Research, 248(3): 772788. https://doi.org/10.1016/j.ejor.2015.08.064
[4] Dai, M., Tang, D., Giret, A., Salido, M.A. (2019). Multiobjective optimization for energyefficient flexible job shop scheduling problem with transportation constraints. Robotics and ComputerIntegrated Manufacturing, 59: 143157. https://doi.org/10.1016/j.rcim.2019.04.006
[5] Türkyılmaz, A., Şenvar, Ö., Ünal, İ., Bulkan, S. (2020). A research survey: Heuristic approaches for solving multi objective flexible job shop problems. Journal of Intelligent Manufacturing, 31: 19491983. https://doi.org/10.1007/s10845020015474
[6] Nouri, H.E., Driss, O.B., Ghédira, K. (2016). Simultaneous scheduling of machines and transport robots in flexible job shop environment using hybrid metaheuristics based on clustered holonic multiagent model. Computers & Industrial Engineering, 102: 488501. https://doi.org/10.1016/j.cie.2016.02.024
[7] May, G., Stahl, B., Taisch, M., Prabhu, V. (2015). Multiobjective genetic algorithm for energyefficient job shop scheduling. International Journal of Production Research, 53(23): 70717089. https://doi.org/10.1080/00207543.2015.1005248
[8] Huang, X., Yang, L. (2019). A hybrid genetic algorithm for multiobjective flexible job shop scheduling problem considering transportation time. International Journal of Intelligent Computing and Cybernetics, 12(2): 154174. https://doi.org/10.1108/IJICC1020180136
[9] Fu, H.C., Liu, P. (2019). A multiobjective optimization model based on nondominated sorting genetic algorithm. International Journal of Simulation Modelling, 18(3): 510520. https://doi.org/10.2507/IJSIMM18(3)CO12
[10] GonzálezRodríguez, I., Puente, J., Palacios, J.J., Vela, C.R. (2020). Multiobjective evolutionary algorithm for solving energyaware fuzzy job shop problems. Soft Computing, 24: 1629116302 https://doi.org/10.1007/s00500020049406
[11] Wu, X., Che, A. (2020). Energyefficient nowait permutation flow shop scheduling by adaptive multiobjective variable neighborhood search. Omega, 94: 102117. https://doi.org/10.1016/j.omega.2019.102117
[12] Wen, X., Wang, K., Li, H., Sun, H., Wang, H., Jin, L. (2020). A Twostage solution method based on NSGAII for green multiobjective integrated process planning and scheduling in a battery packaging machinery workshop. Swarm and Evolutionary Computation, 61: 100820. https://doi.org/10.1016/j.swevo.2020.100820
[13] Yazdani, M., Zandieh, M., TavakkoliMoghaddam, R. (2019). Evolutionary algorithms for multiobjective dualresource constrained flexible jobshop scheduling problem. OPSEARCH, 56(3): 9831006. https://doi.org/10.1007/s1259701900395y
[14] Lan, M., Xu, T.R., Peng, L. (2010). Solving flexible multiobjective JSP problem using an improved genetic algorithm. JSW, 5(10): 11071113. https://doi.org/10.4304/jsw.5.10.11071113
[15] Baizid, K., Chellali, R., Yousnadj, A., Meddahi, A., Bentaleb, T. (2010). Genetic algorithms based method for time optimization in robotized site. In 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2010: 13591364. https://doi.org/10.1109/IROS.2010.5651948
[16] Baizid, K., Yousnadj, A., Meddahi, A., Chellali, R., Iqbal, J. (2015). Time scheduling and optimization of industrial robotized tasks based on genetic algorithms. Robotics and ComputerIntegrated Manufacturing, 34: 140150. https://doi.org/10.1016/j.rcim.2014.12.003
[17] Komaki, G.M., Kayvanfar, V. (2015). Grey wolf optimizer algorithm for the twostage assembly flow shop scheduling problem with release time. Journal of Computational Science, 8: 109120. https://doi.org/10.1016/j.jocs.2015.03.011
[18] Mirjalili, S. (2016). Dragonfly algorithm: A new metaheuristic optimization technique for solving singleobjective, discrete, and multiobjective problems. Neural Computing and Applications, 27(4): 10531073. https://doi.org/10.1007/s0052101519201
[19] Gao, L., Li, X., Wen, X., Lu, C., Wen, F. (2015). A hybrid algorithm based on a new neighborhood structure evaluation method for job shop scheduling problem. Computers & Industrial Engineering, 88: 417429. https://doi.org/10.1016/j.cie.2015.08.002
[20] Zhu, Z., Zhou, X. (2020). An efficient evolutionary grey wolf optimizer for multiobjective flexible job shop scheduling problem with hierarchical job precedence constraints. Computers & Industrial Engineering, 140: 106280. https://doi.org/10.1016/j.cie.2020.106280
[21] Nowicki, E., Smutnicki, C. (1996). A fast taboo search algorithm for the job shop problem. Management Science, 42(6): 797813. https://doi.org/10.1287/mnsc.42.6.797
[22] Zitzler, E., Laumanns, M., Thiele, L. (2001). SPEA 2: Improving the strength Pareto evolutionary algorithm, TIK report 103. Computer Engineering and Networks Laboratory (TIK), ETH Zurich, Zurich, Switzerland, 236.