面向群目标的多无人机智能驱赶与围捕算法研究

    Research on intelligent Multi-UAV expulsion and encirclement algorithm for group targets

    • 摘要: 本文针对多无人机在复杂动态环境下执行群目标驱赶与围捕任务的需求,提出了一种基于深度强化学习中的多智能体双延迟深度确定性策略梯度算法(MATD3)的智能决策算法。文章首先构建了贴近实战的任务场景与无人机运动学模型,并采用人工势场法设计进攻方机动策略。在此基础上,设计融合引导式奖励与稀疏奖励的多维奖励函数,结合安全距离与边界约束,有效提升了无人机协同作战的安全性与鲁棒性。算法采用集中式训练与分布式执行框架,并通过双Critic网络与延迟更新机制克服Q值高估与训练不稳定问题。仿真结果表明,该方法在2对1与3对1场景下均能实现高效的驱赶与围捕,与MADDPG算法相比收敛速度提高13.3%。研究结论表明,所提方法能够显著提升无人机集群的自主决策与协同效能,为未来空中防御与群对群作战提供了可行的新思路与技术支撑。

       

      Abstract: This study addresses the need for multi-UAV operations in complex and dynamic environments to perform group target expulsion and encirclement tasks, and proposes an intelligent decision-making algorithm based on the Multi-Agent Twin Delayed Deep Deterministic Policy Gradient (MATD3). A task scenario close to real combat and a UAV kinematic model are constructed, while the artificial potential field method is introduced to design the maneuvering strategy of offensive UAVs. On this basis, a multi-dimensional reward function integrating guided rewards and sparse rewards is designed, combined with safety distance and boundary constraints to enhance the safety and robustness of UAV collaborative operations. The algorithm adopts a centralized training and decentralized execution framework, and employs dual-Critic networks and a delayed update mechanism to overcome Q-value overestimation and training instability. Simulation results show that the proposed method achieves efficient expulsion and encirclement in both 2-vs-1 and 3-vs-1 scenarios, with a 13.3% improvement in convergence speed compared with the MADDPG algorithm. The findings demonstrate that the proposed approach significantly enhances the autonomous decision-making and cooperative capabilities of UAV swarms, providing a feasible new solution and technical support for future aerospace defense and group-to-group combat.