Reducing Time Steps Needed of Multi-Agent Reinforcement Methods in a Dynamic Navigation Environment

Authors

  • Ziduo Yi Stephen F. Austin High School
  • Fateme Golivand College level mentor
  • Brian Wescott college level mentor

DOI:

https://doi.org/10.47611/jsrhs.v12i4.5683

Keywords:

AI

Abstract

In a simpler version of the delivery variation of the multi-agent pathfinding problem, we use a reinforcement learning approach instead of other common traditional algorithmic heuristics to tackle this NP-hard problem. The paper proposes using either fully decentralized or decentralized data and a centralized model to alleviate the problem of requiring large training time steps in multi-agent reinforcement learning by reducing the number of time steps needed, especially in an environment with cumulative rewards. These approaches don’t require the concatenation of local data among agents, and, as such, are relieved of the computational burden of increased data complexity. The proposed approaches were tested in an environment where multiple agents try to work together and clean the maximum number of total nonregenerative unclean tiles in a grid filled with obstacles in a set amount of time. In a final test on a randomized dynamic environment with varying grid sizes, one of our approaches proved very promising in application and scalability. This paper's most commonly used algorithm was proximal policy optimization, with brief mentions of deep q-network. Different implementations of these algorithms had their training tracked and recorded, including Stablebaseline3 and self-implementation using TensorFlow.

Downloads

Download data is not yet available.

Author Biographies

Fateme Golivand, College level mentor

Computer science research assistant

Brian Wescott, college level mentor

Computer science, assistant professor of instruction

References or Bibliography

Mahoney, Chris. “Reinforcement Learning.” Medium, 17 June 2021, towardsdatascience.com/reinforcement-learning-fda8ff535bb6#4b10.

Silver, David, et al. “A general reinforcement learning algorithm that Masters Chess, Shogi, and go through self-play.” Science, vol. 362, no. 6419, 2018, pp. 1140–1144, https://doi.org/10.1126/science.aar6404.

Ben Dickson, et al. “Ai Defeated Human Champions at Dota 2. Here’s What We Learned.” TechTalks, 23 Nov. 2019, bdtechtalks.com/2019/04/17/openai-five-neural-networks-dota-2/.

Foead, Daniel, et al. “A systematic literature review of a* pathfinding.” Procedia Computer Science, vol. 179, 2021, pp. 507–514, https://doi.org/10.1016/j.procs.2021.01.034.

Stern, Roni, et al. “Multi-agent pathfinding: Definitions, variants, and benchmarks.” Proceedings of the International Symposium on Combinatorial Search, vol. 10, no. 1, 2021, pp. 151–158, https://doi.org/10.1609/socs.v10i1.18510.

Queiroz, Ana Carolina, et al. “Solving multi-agent pickup and delivery problems using a genetic algorithm.” Intelligent Systems, 2020, pp. 140–153, https://doi.org/10.1007/978-3-030-61380-8_10.

Simonini, Thomas. “Proximal Policy Optimization (PPO).” Hugging Face – The AI Community Building the Future., huggingface.co/blog/deep-rl-ppo. Accessed 12 Aug. 2023.

Suran, Abhishek. “Proximal Policy Optimization (PPO) with Tensorflow 2.X.” Medium, 21 Sept. 2020, towardsdatascience.com/proximal-policy-optimization-ppo-with-tensorflow-2-x-89c9430ecc26.

Published

11-30-2023

How to Cite

Yi, Z., Golivand, F., & Wescott, B. . (2023). Reducing Time Steps Needed of Multi-Agent Reinforcement Methods in a Dynamic Navigation Environment. Journal of Student Research, 12(4). https://doi.org/10.47611/jsrhs.v12i4.5683

Issue

Section

HS Research Projects