Abstract
In this paper we present a novel approach to reinforcement learning for continuous state–action control problems. This approach is obtained by combining least square policy iteration (LSPI) with zero-order Takagi–Sugeno fuzzy system, which we call it, “fuzzy least square policy iteration (FLSPI).” FLSPI is a critic-only method and has advantages of both LSPI and fuzzy systems together. We define state–action basis functions based on a fuzzy system while LSPI theorem conditions are satisfied. Our aim is to find the most suitable continuous action in every state using fuzzy rules. This method is learning rate independent and has positive mathematical analysis that defines an error bound for it. We show by simulation that the learning is faster and the quality of results is better than the two previous fuzzy reinforcement learning approaches based on critic-only architecture, i.e., fuzzy Q-learning (FQL) and Fuzzy SARSA Learning (FSL). We test FLSPI on four well-known problems (i.e., boat problem, maze, inverted pendulum and cart–pole balancing) and show the FLSPI higher performance, function of its error bound, its convergence against FQL and FSL divergence and its excellence against the latest proposed methods, respectively.
| Original language | English |
|---|---|
| Pages (from-to) | 849–862 |
| Journal | International Journal of fuzzy systems |
| DOIs | |
| Publication status | Published - 2017 |
| Externally published | Yes |
Fingerprint
Dive into the research topics of 'Fuzzy least square policy iteration and its mathematical analysis.'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver