Offline Model-Learning by Learning Legal Moves in Othello through Synthetic Negative Reinforcement

Authors

  • Johnny Liu Thomas Jefferson High School for Science and Technology
  • Ganesh Mani Carnegie Mellon University

DOI:

https://doi.org/10.47611/jsrhs.v12i4.5638

Keywords:

Offline Reinforcement Learning, Model-Based Reinforcement Learning, Negative Reinforcement, Artificial Intelligence, Othello

Abstract

One of the biggest dependencies of reinforcement learning is sample efficiency since reinforcement learning agents tend to use an excessively large number of episodes to train. These episodes can be expensive, especially when they are conducted in an online manner. Model-Based Reinforcement Learning (MBRL) and offline learning are two methods to aid this problem; MBRL has been shown to improve sample efficiency while offline learning can reduce the number of online episodes needed by substituting them with less expensive offline ones. However, the use of these two methods together is more challenging, encountering issues such as training off of incomplete distributions. We explore these challenges by testing different combinations of offline and online model-learning through learning the legal moves in the board game Othello. During this process, we encounter the additional challenge of only positive reinforcement where offline episodes only provide information about legal moves. To address this problem, we propose a method named synthetic negative reinforcement which uses pre-existing agent knowledge to make up for the lack of information on illegal moves. Our results demonstrate the efficacy of offline learning using synthetic negative reinforcement on robust distributions of offline data, with agents achieving greater than 97% accuracy predicting the legality of moves. We also demonstrate the evident obstacle that skewed distributions provide to offline model-learning.

Downloads

Download data is not yet available.

References or Bibliography

Agarwal, R., Kumar, A., Zhou, W., Rajeswaran, A., Tucker, G., & Precup, D. (2022, December 2). 3rd offline RL workshop: Offline RL as a "Launchpad." NeurIPS. Retrieved August 10, 2023, from https://offline-rl-neurips.github.io/2022/

Allis, V. L. (1994). Searching for solutions in games and artificial intelligence [Doctoral dissertation, University of Limburg]. François Grieu. http://fragrieu.free.fr/SearchingForSolutions.pdf

Buro, M. (2003). The evolution of strong Othello programs. Entertainment Computing. https://doi.org/10.1007/978-0-387-35660-0_65

Fédération Française d'Othello. (2023, April 26). La base WTHOR. Fédération Française d'Othello. Retrieved August 10, 2023, from https://www.ffothello.org/informatique/la-base-wthor/

He, Q., Liu, J. L., Eschapasse, L., Beveridge, E. H., & Brown, T. I. (2022). A comparison of reinforcement learning models of human spatial navigation. Scientific Reports, 12(1), 1-11. https://doi.org/10.1038/s41598-022-18245-1

István, M. (2017). How should I handle invalid actions (when using REINFORCE)? [Online forum post]. StackExchange. https://ai.stackexchange.com/questions/2980/how-should-i-handle-invalid-actions-when-using-reinforce

Kidambi, R., Rajeswaran, A., Netrapalli, P., & Joachims, T. (2020). MOReL: Model-based offline reinforcement learning. NeurIPS. https://doi.org/10.48550/arXiv.2005.05951

Kim, Y., Lee, H., & Ryu, J. (2022). Learning an accurate state transition dynamics model by fitting both a function and its derivative. IEEE Access, 10, 44248-44258. https://doi.org/10.1109/ACCESS.2022.3169798

LeCun, Y. (2022, June 27). A path towards autonomous machine intelligence. OpenReview. Retrieved August 8, 2023, from https://openreview.net/pdf?id=BZ5a1r-kVsf

Levine, S., Kumar, A., Tucker, G., & Fu, J. (2020, May 4). Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv. Retrieved August 1, 2023, from https://doi.org/10.48550/arXiv.2005.01643

Merkel, O. (2023). UCThello. GitHub. Retrieved August 8, 2023, from http://omerkel.github.io/UCThello/html5/src/

Quin, S., & Nicolet, S. (n.d.). Format de la base Wthor [Wthor database format]. Fédération Française d'Othello. Retrieved August 10, 2023, from https://www.ffothello.org/wthor/Format_WThor.pdf

Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction (2nd ed.). MIT Press. https://www.andrew.cmu.edu/course/10-703/textbook/BartoSutton.pdf

van der Ree, M., & Wiering, M. (2013). Reinforcement learning in the game of Othello: Learning against a fixed opponent and learning from self-play. 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning. https://doi.org/10.1109/ADPRL.2013.6614996

Wang, Z., Fu, Q., Chen, J., Wang, Y., & Lu, Y. (2023). Reinforcement learning in few-shot scenarios: A survey [Abstract from ProQuest Databases]. Journal of Grid Computing, 21(2). https://doi.org/10.1007/s10723-023-09663-0

Xiang, S. (2021, December 19). Reinforcement learning Q-learning with illegal actions from scratch. Towards Data Science. Retrieved August 10, 2023, from https://towardsdatascience.com/reinforcement-learning-q-learning-with-illegal-actions-from-scratch-19759146c8bf

Published

11-30-2023

How to Cite

Liu, J., & Mani, G. (2023). Offline Model-Learning by Learning Legal Moves in Othello through Synthetic Negative Reinforcement. Journal of Student Research, 12(4). https://doi.org/10.47611/jsrhs.v12i4.5638

Issue

Section

HS Review Articles