ApexNav: An Adaptive Exploration Strategy for Zero-Shot Object Navigation with Target-centric Semantic Fusion

The paper is under review.

Mingjie Zhang1, Yuheng Du1, Chengkai Wu1, Jinni Zhou1, Zhenchao Qi1, Jun Ma1, Boyu Zhou2,†

1 The Hong Kong University of Science and Technology (Guangzhou).   
2 Southern University of Science and Technology.   
Corresponding Authors

Abstract


Navigating unknown environments to find a target object is a significant challenge. While semantic information is crucial for navigation, relying solely on it for decision-making may not always be efficient, especially in environments with weak semantic cues. Additionally, many methods are susceptible to misdetections, especially in environments with visually similar objects. To address these limitations, we propose ApexNav, a zero-shot object navigation framework that is both more efficient and reliable. For efficiency, ApexNav adaptively utilizes semantic information by analyzing its distribution in the environment, guiding exploration through semantic reasoning when cues are strong, and switching to geometry-based exploration when they are weak. For reliability, we propose a target-centric semantic fusion method that preserves long-term memory of the target object and similar objects, reducing false detections and minimizing task failures. We evaluate ApexNav on the HM3Dv1, HM3Dv2, and MP3D datasets, where it outperforms state-of-the-art methods in both SR and SPL metrics. Comprehensive ablation studies further demonstrate the effectiveness of each module. Furthermore, real-world experiments validate the practicality of ApexNav in physical environments.

System Overview


Top

Simulation Experiments



Real-World Experiments


We evaluate ApexNav's performance in various real-world scenarios involving diverse target objects.


Benchmark Comparisons


Top