Dynamic Swarm Navigation

This project was undertaken as part of Inter-IIT Tech Meet 13.0 at IIT Bombay, based on a problem statement provided by Kalyani Bharatforge. Titled “Centralized Intelligence for Dynamic Swarm Navigation,” the project aimed to develop an effective approach for swarm navigation in continuously evolving environments.

Small Swarm Navigation

Figure 1: Small Swarm Navigation

Large Swarm Navigation

Figure 2: Large Swarm Navigation

Introduction

One of the essential technologies for creating smart robot swarms is motion planning. Traditional path planning methods in robotics require precise localization and complete maps, limiting their adaptability in dynamic environments. Reinforcement learning (RL), especially deep reinforcement learning (DRL), has emerged as a key strategy to address such issues.

Simulation

The simulation framework is developed in Gazebo Classic and integrated with ROS 2 Humble. Each robot carries a 2D LiDAR sensor used for obstacle detection. Robots are deployed at random positions in the environment, and their initial configurations are defined in a YAML file.

Reward Plot

Figure 3: Illustration of Process Flowchart

Reinforcement Learning

The navigation task is formulated as a Markov Decision Process (MDP) represented by the tuple (S, A, S′, P, R), where each element holds its standard interpretation.

State Space

Action Space

Reward Function

RL Algorithm

The primary RL navigation framework is implemented using the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm. TD3 is an advanced off-policy RL algorithm. The key reasons for selecting TD3 over alternative baseline algorithms are

Methodology

The task was implemented as Decentralized Training and Decentralized Execution (DTDE). Initially, a single agent (robot) is trained within a dynamic environment. The trained networks are then deployed across multiple instances, corresponding to the number of robots.

Experiment and Simulation Results

Reward Plot

Figure 4: DDPG v/s TD3 Reward Plot

Training Status

Figure 5: DDPG v/s TD3 Episodic Status Plot

Training Status

Figure 6: DDPG v/s TD3 Robot Distance Travel Plot