Joint Scheduling of Charging and Service Operation of Electric Taxi Based on Reinforcement Learning

Joint Scheduling of Charging and Service Operation of Electric Taxi Based on Reinforcement Learning

Chen ZhuFuchun Jiang Yuliang Tang 

School of Opto-Electronic and Communication Engineering, Xiamen University of Technology, Xiamen 361024, China

School of Informatics, Xiamen University, Xiamen 361005, China

Corresponding Author Email: 
zhuchen@xmut.edu.cn
Page: 
267-272
|
DOI: 
https://doi.org/10.18280/jesa.550215
Received: 
9 January 2022
|
Revised: 
15 February 2022
|
Accepted: 
26 February 2022
|
Available online: 
30 April 2022
| Citation

© 2022 IIETA. This article is published by IIETA and is licensed under the CC BY 4.0 license (http://creativecommons.org/licenses/by/4.0/).

OPEN ACCESS

Abstract: 

Since the high daily power consumption, electric taxis require frequently recharging. Affected by the step tariff and shifting of duty, congestion often occurs during peak hours at charging stations, which seriously affects the normal operation of the traffic and electricity grid. This paper proposes a joint management architecture that integrates the service operation of e-taxis and charging networks. Aiming at minimizing drivers' charging overhead, a scheduling scheme that combines taxi service operation scheduling with charging planning is designed based on reinforcement learning method. The low battery e-taxi is arranged to pick up the passenger whose destination is close to an appropriate charging station. Simulation results show that the proposed scheme can effectively reduce drivers' charging overhead by shortening deadhead kilometers and waiting time at charging station.

Keywords: 

charging scheduling, electric taxi, reinforcement learning, service operation scheduling

1. Introduction

In order to cope with the environmental pollution and energy crisis, electric vehicles (EVs), which are considered as a paradigm of green transportation, have been deployed on a large scale with local governments’ support and incentives. The ever-growing application of EVs calls for appropriate management of EVs’ recharging. The massive disorderly recharging behaviors not only leads to charging stations' congestion but also impacts the power grid and transportation networks. Furthermore, drivers usually have to waste lots of time at charging stations, and the terrible experience may affect the popularization of EVs.

For the sake of improving the recharging experience of drivers, charging scheduling has become a hot spot in current research. The relatively mainstream research idea focuses on static scheduling strategies based on prior knowledge about drivers’ travel plans [1, 2], which apply to private EVs since their regular routine. On the other hand, dynamic scheduling strategies have a wider application range, and AI-based algorithms and game theory are considered as powerful approaches to solve complicated decision-making problems [3-6]. The previous works designed real-time EV charging scheduling strategies through dynamic tariffs and queuing control and might not have considered the difference in travel characteristics of EVs [7-9]. In recent years, there are numerous papers dedicated to the scheduling of EV charging, while few works specifically targeted at e-taxis. However, e-taxis require more frequently recharging than other types of EVs due to their longer daily travel distances and more energy consumption. Moreover, each charge takes scores of minutes to hours, including the time of waiting and recharging, which seriously affects the driver’s income. According to the high mobility of e-taxis, some papers proposed partial charging scheduling strategies which allow e-taxis to get frequent partial recharged in the idle time of different charging stations [10, 11]. This kind of approach can avoid e-taxis’ batteries running too low, and each charge takes a short time, so the driver doesn’t need to worry about missing the peak hour for passenger transport. In reference [12], a model is developed to estimate the optimal charging stations distribution and siting, and the e-taxis’ service cost can be effectively reduced.

Different from private EVs, the traffic paths of e-taxis depend on the passengers’ demands. The passenger’s travel route largely determines whether the e-taxi can reach a charging station convenient or not. According to this travel characteristic, we propose a joint management architecture that integrates e-taxis' service operation and charging networks. A novel joint scheduling strategy is designed based on reinforcement learning. Through reasonable service operation scheduling, the low battery e-taxi is arranged to pick up the passenger whose destination is close to an appropriate charging station, and the deadhead kilometers and drivers’ waiting time at charging stations are both reduced. The simulation results show the proposed solution is effective in minimizing the total cost of the e-taxi’s recharging.

2. Network Architecture

This paper proposes a joint management architecture that integrates e-taxis' service operation and charging scheduling, as shown in Figure 1. This architecture includes the charging management platform and the service operation management platform. The joint scheduling center belongs to two platforms. It is in charge of integrating information from two platforms, generating charging demands of e-taxis at the beginning of each scheduling cycle (e.g., every 15 minutes), implementing charging scheduling algorithm and making charging plans.

Figure 1. An overview of the joint management architecture

The charging management platform includes the charging station management center and the joint scheduling center, which is responsible for daily operation management of all charging stations, information exchanging and making charging plans.

The service operation management platform consists of the service management center, the vehicle information management center, and the joint scheduling center. The vehicle information management center exchanges information with e-taxis through the roadside units (RSUs). Each e-taxi regularly reports its travel information, e.g., state of charge (SOC), location, speed to the nearest RSU. RSUs collect information reported and transfer it to the vehicle information management center. At the end of each scheduling cycle, the vehicle management center integrates and sends the information including SOC, location, speed about all e-taxis to the joint scheduling center. The business management center receives passengers’ orders online through taxi-haling apps, and the orders during the scheduling cycle will be pushed to nearby e-taxis. In a short period of time (e.g., several minutes) before the end of each scheduling cycle, the business management center stops dispatching orders and reports all the order information during the cycle to the joint scheduling center. Among them, the orders that have not yet been arranged will be input into the charging scheduling algorithm for joint scheduling. The input of the charging scheduling algorithm also includes charging station operation information and e-taxis travel information. The output of the algorithm is a charging plan and an order plan. The charging plan is sent to every charging station through the charging management platform. According to the plan, charging stations reserve plug-in chargers and time slots for corresponding e-taxis. The business management center sends orders to e-taxis according to the order plan. If there are still some orders that have not been arranged, they will be pushed to the available nearby e-taxis. The vehicle information management center sends the charging plan to the RSU that the e-taxi last passed by and the RSU which it will visit based on its order route. When the e-taxi accesses the RSU, it will get the charging plan and the navigation.

The joint management architecture is the basis of the charging scheduling algorithm, which ensures the orderly charging of e-taxis through reasonable charging scheduling and business arrangement. Compared with the charging scheduling only structure, the joint management architecture allows the e-taxi to pick up the passenger and choose an appropriate charging station close to the passenger’s destination instead of waiting in vain at peak period of recharging. Based on this structure, the total cost of recharging is reduced by shortening idle traveling time and waiting time.

3. System Model

The time horizon is discretized into i time slot. Let N={1,2,…,n} be the set of e-taxis that require to be charged in moment i, and K={1,2,…,k} be the set of passengers waiting for scheduling. Let M={1,2,…,m} denotes the set of charging stations.

The cost of charging can be separated into parts shown below:

(1) The Cost of Payment $P_{n}$

Let $T_{m n}^{\text {charging }}$ (hour) denotes the charging time, which can be expressed as

$T_{m n}^{\text {charging }}=\frac{\left(E_{n}-E_{i^{\prime} n}\right) \cdot B_{n}}{\rho}$    (1)

where, i’ is the moment that e-taxis arrive at the target charging station, and $E_{i \prime n}$ denotes the battery SOC of e-taxi n at moment i’, and $E_{n}$ is the expected battery SOC of e-taxi n after charging completion, and the battery capacity of e-taxi n is denoted by $B_{n}$ (kWh). In addition, we assume that the output power of plug-in chargers is fixed and denoted by ρ (kW).

According to the TOU power price, $p_{p}$, $p_{f}$ and $p_{v}$ denotes the electricity price in peak, flat and valley period respectively, and the unit is ¥/kWh. The charging service fee is denoted by $p_{s}$ (¥/kWh). Therefore, the payment cost can be expressed as

$P_{n}=\left\{\begin{array}{l}\left(p_{p}+p_{s}\right) \cdot\left(E_{n}-E_{i^{\prime} n}\right) \cdot B_{n}, i^{\prime} \in \text { peak } \\ \left(p_{f}+p_{s}\right) \cdot\left(E_{n}-E_{i^{\prime} n}\right) \cdot B_{n}, i^{\prime} \in \text { flat } \\ \left(p_{v}+p_{s}\right) \cdot\left(E_{n}-E_{i^{\prime} n}\right) \cdot B_{n}, i^{\prime} \in \text { valley }\end{array}\right.$    (2)

(2) The Cost of Charging Time $L_{n}$

Assume the expected income per hour of the e-taxis is $\tilde{G}$ (¥/h), and the revenue loss from charging time can be expressed as

$L_{n}=T_{m n}^{\text {charging }} \cdot \tilde{G}$    (3)

(3) The Cost of Waiting Time $W_{n}$

$T_{m n}^{\text {wait }}$ (hour) denotes the waiting time after e-taxi n arrives at the charging station m. The information about the occupation of the plug-in chargers and the expected earliest available time for charging can be estimated and provided by the charging station management center. The revenue loss from waiting time can be expressed as

$W_{n}=T_{m n}^{w a i t} \cdot \widetilde{G}$    (4)

(4) The Cost of Idle Traveling Time $V_{n}$

Assume e-taxi n take $T_{m n}^{t r a v e l}$ hours to travel to the target charging station m, and the speed of e-taxi is fixed and denoted by v (km/h). As shown in Figure 2, $D_{k, k^{\prime}}$ (km) denotes the minimum traveling distance of passenger k. $D_{0}$ (km) denotes the deadhead distance of e-taxi n, and is consist of two parts: the distance between the location of e-taxi n and the starting point of passenger k, and the distance between the destination of passenger k and the location of the target charging station m, which yields $T_{m n}^{\text {travel }}=\left(D_{k, k^{\prime}}+D_{0}\right) / v$. The minimum distance between the location of e-taxi n and the target charging station m is denoted by $D_{m, n}$ (km). If the e-taxi does not pick up any passengers on the way to the target charging station, $T_{m n}^{\text {travel }}=D_{0} / v=D_{m, n} / v$ in this circumstance.

According to the order plan made by the joint scheduling center, e-taxis benefit from transporting passengers. The gain of service scheduling can be expressed as

$G_{n}=\frac{D_{m, n}-D_{0}}{v} \cdot \tilde{G}$    (5)

$G_{n}$ is the expected revenue of the shortened idle traveling distance, and it is unrelated to the service revenue. It could be positive, negative or zero. For example, an e-taxi is 10 km from the charging station and a passenger’s destination is 10km in the opposite direction, which means the e-taxi needs to travel 20 km to the target charging station after transporting the passenger. In this circumstance, $G_{n}$ is negative. Consequently, the e-taxi also has the option of going straight to the target charging station instead and, in this case, $G_{n}=0$.

The cost of idle traveling time can be expressed as

$V_{n}=T_{m n}^{\text {travel }} \cdot \tilde{G}-G_{n}=\tilde{G} \cdot \frac{D_{k, k^{\prime}}+2 D_{0}-D_{m, n}}{v}$    (6)

Figure 2. The illustration of $D_{k, k^{\prime}}$, $D_{0}$ and $D_{m, n}$

(5) Objective Function

Our objective is to find an optimal charging scheduling policy $\pi^{*}$, which specifies the e-taxi n the target charging station m and arranges it to transport the passenger k on the way, with the minimum total overhead, that is,

$\pi^{*}(n, m, k)=\arg \min \left[P_{n}+L_{n}+V_{n}+W_{n}\right]$    (7)

Furthermore, to avoid the e-taxi which should be charged, chooses to continuously transport passengers and lead to breakdown or a very low $E_{i^{\prime} n}$, the constraint is added: Each e-taxi is only allowed to pick up one passenger during a charging event.

4. Charging Scheduling Scheme Based on Reinforcing Learning

4.1 Preparation

The joint charging scheduling center regularly receives information from the vehicle information management center, including the current location of all e-taxis, the battery SOC, the traffic information, the service information, the destination location of all e-taxis, the estimated time of arrival. When the battery SOC ≤ 40%, the joint charging scheduling center will flag the e-taxi needs recharging.

After generating the set N of the e-taxis that need recharging, the joint charging scheduling center interactively forms the charging scheduling scheme based on reinforcing learning. As shown in Figure 1, the input parameters related to the charging system are provided by the charging station management center, such as the location of charging station, the occupation of charging stations and the expected earliest available time (used to estimate $T_{m n}^{\text {wait }}$). The travel information (the e-taxi’s SOC, location, speed) and service information (the passenger’s location and destination) is provided by the vehicle information management center and the business management center respectively. Finally, the output scheme is delivery to the related charging station and e-taxis via each management center.

4.2 The reinforcing learning based method

The state, action and reward of reinforcing learning are described as follows.

1) State: S is denoted the system state space, and it is consisted of a group of vectors $s_{t}$ at each time slot t. The state is concerned with vehicles, charging stations and passengers, and denoted as $s_{t}=\left(E_{t n}, C V_{t n}, C V_{m}, C V_{k}, C V_{k^{\prime}}\right)$, where $E_{t n}$ denotes the SOC of e-taxi n at time slot t, $C V_{t n}$ is the coordinates of e-taxi n at time slot t, $C V_{m}$ is the coordinates of the target charging station m, $C V_{k}$ and $C V_{k^{\prime}}$ represent the starting and ending coordinates of passenger k. Besides, $s_{0}$ represents the initial state, and when the e-taxi arrives at the target station it eventually reaches its end state. If the e-taxi heads for the target charging station directly without carrying any passenger, $C V_{k}$ and $C V_{k^{\prime}}$ are both set to (-1,-1).

2) Action: A denotes the action space. For each state $s_{t} \in \boldsymbol{S}$, the action of e-taxi n is represented as $a_{t}^{n}=(m, k)$, which means e-taxi n will pick up passenger k on the way to its target charging station m. If a passenger’s route is fit to several e-taxis, the passenger will be allocated to the nearest e-taxi. Besides, if the e-taxi does not pick up passengers, k=0.

3) Reward: R is the immediate reward when state transits from $s_{t}$ to $s_{t+1}$ over action $a_{t}$. The reward function can be expressed as follows:

$R_{t}\left(s_{t}, a_{t}^{n}, s_{t+1}\right)=\left\{\begin{array}{lr}\frac{1}{60 \cdot\left(T_{m n}^{w a i t}+D_{m, n} / v\right)}, & \text { if } n \text { selects charging station } m \\ \frac{D_{m, n}-D_{0}}{D_{m, n}}, & \text { if } n \text { selects passenger } k \\ 0, & \text { if } n \text { does not pick up any passenger } \\ 10, & \text { if } n \text { arives the target charging station }\end{array}\right.$    (8)

At first, the e-taxi chooses a charging station as target. Then, it chooses a passenger. The immediate reward is concerned with the waiting time, distance, passengers’ starting and ending coordinates. If picking up the passenger will result in an increase of idle traveling time, the immediate reward will be negative.

4) The action-value function: the cumulative discounted reward can be expressed as $G_{t}=\sum_{t=1}^{T} \gamma^{T-1} R_{t+T}$, where $\gamma \in(0,1]$ is the discount rate and denotes the reward decaying with the time. T is the terminal moment of one procedure.

The action-value function expresses the expectation of the cumulative reward in state $s_{t}$ over action $a_{t}$ on policy π, which yields

$Q_{\pi}\left(s_{t}, a_{t}\right)=\mathbb{E}_{\pi}\left[G_{t} \mid S_{t}=s_{t}, A_{t}=a_{t}\right]$    (9)

This paper uses Q-learning to find an optimal policy to maximize the action-value function,

$Q^{*}(s, a)=\max _{\pi} Q_{\pi}(s, a)$    (10)

The iterative process can be expressed as

$Q\left(S_{t}, A_{t}\right) \leftarrow Q\left(S_{t}, A_{t}\right)+\alpha\left[R_{t}+\gamma \max _{a_{t}} Q\left(S_{t+1}, a_{t}\right)-Q\left(S_{t}, A_{t}\right)\right]$    (11)

where, $\alpha \in(0,1)$ denotes the learning rate. When $t \rightarrow \infty$, if $\alpha \rightarrow$0, the action-value function converges to the optimal $Q^{*}(s, a)$.

To avoid the algorithm falls into locally optimal solution, the new actions are explored via ε-greedy method [13]. The proposed method is shown in Algorithm 1.

Algorithm 1

Input: initial state space S, action space A, original state $s_{0}$, discount rate $\gamma$, exploration rate ε, learning rate $\alpha$

1: $Q(s, a)=0, \pi(s, a)=\frac{1}{|\boldsymbol{A}(s)|}$

2: $s=s_{0}$

3: For $t=1, \cdots, T$ do

4: if $\operatorname{rand}()<\mathcal{E}$ then

5: Randomly select E-taxi’s action $a_{t}$

6: else

7: $a_{t}=\arg \max _{a_{t}} Q\left(s_{t}, a_{t}\right)$

8: end if

9: calculate the immediate reward $R_{t}$ and the next state $s_{t+1}$

10: update the Q-value

11: take the new state $s=s_{t+1}$

12: end For

Output: policy π

5. Simulation Result and Performance Analysis

The simulation setting refers to the ref. [7], considering a grid map, and each edge indicates 1 km. To abundantly illustrate the performance of the proposed strategy, there are 3 sizes of the map, and respectively are 5×5, 10×10, 20×20 grid. Figure 3 provides a sample in a 10×10 grid model. The charging station, e-taxis and passengers are generated randomly on the intersection point of the grid. The simulation parameters of the e-taxi are referred to BYD E6 operating in Shenzhen city. The details of parameters setting are shown in Table 1.

Figure 3. A sample in a 10×10 grid model

Table 1. Simulation parameters

Parameter

Value

number of Charging station/ number of chargers per station

1/3 (5×5 size)

4/2 (10×10 size)

4/4 (20×20 size)

number of e-taxis

5 (5×5 size)

10 (10×10 size)

20 (20×20 size)

number of passengers

4 (5×5 size)

9 (10×10 size)

18 (20×20 size)

electricity price in peak period $p_{p}$

(8:00-11:00, 18:00-23:00)

0.9799 ¥/kWh

electricity price in flat period $p_{f}$

(0:00-7:00/11:00-18:00)

0.6274 ¥/kWh

electricity price in valley period $p_{v}$ (23:00-7:00)

0.2034 ¥/kWh

charging service fee $p_{s}$

0.8 ¥/kWh

speed of e-taxi v

40 km/h

expected income of e-taxis per hour $\tilde{G}$

50 ¥/h

expected battery SOC $E_{n}$

90%

battery capacity B

82 kWh

output power of chargers ρ

60 kWh

learning rate α

0.01

exploration rate ε

0.5

discount rate γ

0.9

The cost of each aspect in the proposed strategy is shown in Figure 4 and compared with the charging scheduling only strategy in a 10×10 grid model. The theoretical minimum cost is provided as a base, which means the e-taxi is just nearby a charging station available when it needs to be charged, and the minimum cost consists of payment and the cost of charging time only. The simulation result shows the joint scheduling strategy decreases the cost of idle traveling time and waiting time by transporting passengers on the e-taxi’s way to the charging station. Meanwhile, the travel distance extends and the cost of payment and charging time increases. The amount of imported energy increases by an average of 2.11 kwh per e-taxi. On the whole, the overall cost of recharging is reduced by taking the proposed strategy.

The performance of the proposed strategy in different map sizes is shown in Figure 5. The comparison shows that in the 20×20 grid model, the joint scheduling leads to the most efficient effect of reducing the recharging cost, while its effect is not distinct in the 5×5 grid model. Due to the increased map size and number of e-taxis, the cost of idle traveling time and waiting time rises faster and has a greater impact on the total cost. While the growth of the cost of payment and charging time are basically flat in different map sizes.

Figure 4. The comparison of joint scheduling strategy with charging scheduling only strategy in a 10×10 grid model

Figure 5. The comparison of average cost of recharging in different map sizes

6. Conclusion

In this paper, we design a joint management architecture of recharging and taxi service operation. Based on the architecture, a joint scheduling strategy of e-taxi hailing and recharging is proposed to reduce the total cost of recharging. The low battery e-taxi is arranged to pick up the passenger whose destination is close to an appropriate charging station. According to simulation results, the strategy is effective in lowering the e-taxi’s idle travelling time and waiting time at the charging station and leads to a reduction in total cost. Therefore, the proposed solution is helpful in eliminating the inconvenience of charging electric vehicles.

Acknowledgement

This work is supported by Major Project of Science and Technology Plan of Xiamen City (Grant No.: 3502ZCQ20191002); High-level Talent Project of Xiamen University of Technology (Grant No.: YKJ18006R).

  References

[1] Hou, L., Wang, C., Yan, J. (2019). Bidding for preferred timing: an auction design for electric vehicle charging station scheduling. IEEE Transactions on Intelligent Transportation Systems, 21(8): 3332-3343. https://doi.org/10.1109/TITS.2019.2926336

[2] Hou, L., Yan, J., Wang, C. (2019). Accommodating more users in highway electric vehicle charging through coordinated booking: A market-based approach. In 2019 IEEE 23rd International Conference on Computer Supported Cooperative Work in Design (CSCWD). Porto, Portugal, pp. 476-481. http://dx.doi.org/10.1109/CSCWD.2019.8791862

[3] Tucker, N., Alizadeh, M. (2019). An online admission control mechanism for electric vehicles at public parking infrastructures. IEEE Transactions on Smart Grid, 11(1): 161-170. https://doi.org/10.1109/TSG.2019.2918524

[4] Zhang, Y., You, P., Cai, L. (2018). Optimal charging scheduling by pricing for EV charging station with dual charging modes. IEEE Transactions on Intelligent Transportation Systems, 20(9): 3386-3396. https://doi.org/10.1109/TITS.2018.2876287

[5] Vuelvas, J., Ruiz, F., Gruosso, G. (2019). Price based optimization for electrical vehicle charging scheduling. In 2019 IEEE Vehicle Power and Propulsion Conference (VPPC), Hanoi, Vietnam, pp. 1-5. http://dx.doi.org/10.1109/VPPC46532.2019.8952532

[6] Vidanalage, I., Venkatesh, B., Torquato, R., Freitas, W. (2017). Scheduling of electrical vehicle charging for a charging facility with single charger. In 2017 IEEE Electrical Power and Energy Conference (EPEC), Saskatoon, SK, Canada, pp. 1-6. https://doi.org/10.1109/EPEC.2017.8286240

[7] Liu, J., Guo, H., Xiong, J., et al. (2019). Smart and resilient EV charging in SDN-enhanced vehicular edge computing networks. IEEE Journal on Selected Areas in Communications, 38(1): 217-228. https://doi.org/10.1109/JSAC.2019.2951966

[8] Chen, L., Yu, T., Chen, Y., et al. (2020). Real-time optimal scheduling of large-scale electric vehicles: A dynamic non-cooperative game approach. IEEE Access, 8: 133633-133644. https://doi.org/10.1109/ACCESS.2020.3009039

[9] Li, H., Wan, Z., He, H. (2019). Constrained EV charging scheduling based on safe deep reinforcement learning. IEEE Transactions on Smart Grid, 11(3): 2427-2439. https://doi.org/10.1109/TSG.2019.2955437

[10] Yuan, Y., Zhang D., Miao, F., et al. (2019). p^2 charging: Proactive partial charging for electric taxi systems. In 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS), Dallas, TX, USA, pp. 688-699. https://doi.org/10.1109/ICDCS.2019.00074

[11] Wang, G., Zhang, Y., Fang, Z., et al. (2020). Fair charge: A data-driven fairness-aware charging recommendation system for large-scale electric taxi fleets. In Proceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies, 4(1): 1-25. http://dx.doi.org/10.1145/3381003

[12] Bauer, G., Greenblatt, J., Gerke, B. (2018). Cost, energy, and environmental impact of automated electric taxi fleets in Manhattan. Environmental Science & Technology, 52(8): 4920-4928. https://doi.org/10.1021/acs.est.7b04732

[13] Dann, C., Neumann, G., Peters J. (2014). Policy evaluation with temporal differences: A survey and comparison. Journal of Machine Learning Research, 15: 809-883. https://dl.acm.org/doi/10.5555/2627435.2638563