Projects – Smart Autonomous Robotics

PerchRL: Vision-Based Agile Perching

Submitted to CoRL 2026

We propose PerchRL, a reinforcement learning framework for vision-based agile perching on inclined platforms under rapid and irregular motion. PerchRL employs a two-stage learning strategy with state-based pre-training followed by vision-based fine-tuning, using randomized platform trajectories and temporal augmentation for motion generalization. A visibility-aware state augmentation and active perception rewards are introduced to maintain robustness under intermittent visual loss caused by limited FOV. The framework achieves stable real-time perching on diverse quadrotor platforms in both simulation and real-world experiments.

AD-Planner: In-flight Payload Delivery

T-RO 2026

We present a systematic solution for autonomous delivery using a quadrotor with a suspended payload, covering the complete process of picking, transportation, and casting toward dynamic targets. An error-bounded planning method accounts for uncertainty to achieve accurate picking and casting. A compact optimization formulation simultaneously considers safety, feasibility, and efficiency, enabling seamless and agile delivery operations. A hierarchical initialization strategy further improves computational efficiency and solution quality. [Paper]

Efficient and Reliable Mobile Manipulation

Submitted to IJRR 2026

We present a framework that synergizes efficiency and reliability for continuous mobile manipulation in complex environments. The system integrates whole-body motion planning with adaptive control, enabling seamless coordination between base mobility and arm manipulation for long-horizon tasks. A hierarchical optimization structure balances task efficiency with operational safety, allowing the robot to persistently execute manipulation tasks while adapting to environmental changes.

FlyCo: Autonomous 3D Structure Scanning

Submitted to IJRR 2026

We present FlyCo, a foundation model (FM)-empowered framework enabling fully autonomous, prompt-driven 3D target scanning in diverse open-world environments. FlyCo establishes a principled perception-prediction-planning loop: (1) perception fuses streaming sensor data with vision-language FMs for robust target grounding and tracking; (2) prediction distills FM knowledge with multi-modal cues to infer the complete geometry of partially observed targets; (3) planning leverages predictive foresight to generate efficient and safe scanning paths with comprehensive coverage. Real-world experiments demonstrate precise scene understanding with substantially lower human effort compared to existing paradigms.

C²-Explorer: Decentralized Multi-UAV Exploration

IROS 2026

We present C²-Explorer, a decentralized multi-UAV exploration framework addressing the bottleneck of inadequate task representation and allocation under limited communication. The method constructs a connectivity graph to decompose disconnected unknown components into independent task units, and introduces a contiguity-driven allocation formulation with a graph-based neighborhood penalty to discourage non-adjacent assignments, promoting more contiguous task sequences over time. Extensive simulations show C²-Explorer reduces average exploration time by 43.1% and path length by 33.3% compared to SOTA baselines.

LAGO Policy: Smooth Robotic Manipulation

IROS 2026

We propose LAGO Policy, a unified asynchronous action-generation framework that integrates trajectory optimization with diffusion policy for smooth and safe manipulation. To address inter-chunk discontinuities and lack of obstacle-aware execution in prior diffusion-based policies, LAGO introduces latency-aware classifier-free guidance conditioned on future actions for improved inter-chunk consistency. It further enables goal-directed collision-free trajectory planning by predicting task-relevant interaction goals, and employs spatial-temporal trajectory optimization to refine actions for low-jerk, feasible motion. Real-world experiments demonstrate high task success across challenging manipulation tasks.

OnFly: Aerial Vision-Language Navigation

IROS 2026

We propose OnFly, a fully onboard, real-time framework for zero-shot aerial vision-language navigation (AVLN). OnFly adopts a shared-perception dual-agent architecture that decouples high-frequency target generation from low-frequency progress monitoring, stabilizing VLM-based decision-making. It further employs a hybrid keyframe-recent-frame memory to preserve global trajectory context while maintaining KV-cache prefix stability. A semantic-geometric verifier refines VLM-predicted targets using depth cues, and a receding-horizon planner generates optimized collision-free trajectories. In simulation, OnFly improves task success from 26.4% to 67.8% over the strongest SOTA baseline.

Palm-sized Omnidirectional UAV Exploration

Submitted to RA-L 2026

We propose a palm-sized omnidirectional vision-based UAV exploration system guided by sparse topological maps. The ultra-compact form factor enables agile flight in confined and cluttered environments where larger platforms cannot operate. Omnidirectional visual sensing provides full spherical perception, while the sparse topological map framework efficiently guides exploration by maintaining a lightweight global structure for real-time path planning. This combination allows the system to achieve efficient autonomous exploration with minimal onboard sensing and computation.

Multi-Floor Autonomous Exploration

Submitted to RA-L 2026

We present a multi-floor exploration framework for ground robots that addresses the challenge of navigating and exploring structured multi-level environments. The method constructs an incremental reachable graph that captures connectivity across floors, combined with structural priors (e.g., staircases, elevator shafts) to guide exploration decisions. This enables efficient traversal and complete coverage of multi-story buildings without relying on pre-existing floor plans, adapting online to the discovered environment topology.

AirHunt: VLM-Based Aerial Object Navigation

Submitted to TASE 2026

We present AirHunt, an aerial object navigation system that efficiently locates open-set objects with zero-shot generalization in outdoor environments. To bridge the orders-of-magnitude frequency mismatch between VLM inference and real-time planning, AirHunt features a dual-pathway asynchronous architecture enabling continuous flight with adaptive semantic guidance. An active dual-task reasoning module exploits geometric and semantic redundancy for selective VLM querying, while a semantic-geometric coherent planning module dynamically reconciles semantic priorities with motion efficiency in a unified framework for large-scale environments.

FLARE: Fast Aerial Exploration with Active LiDAR

TASE 2025

We present FLARE, a fast autonomous aerial exploration framework for large-scale 3D scenarios using an actively rotated LiDAR. Unlike fixed LiDAR configurations with limited FoV, the actively rotating mechanism provides a controllable wide Field-of-View that can be dynamically oriented toward unexplored regions, significantly improving sensing coverage during flight. FLARE leverages this novel sensing paradigm with efficient frontier-based planning to achieve rapid and thorough exploration in complex large-scale environments, substantially outperforming conventional fixed-LiDAR exploration methods.

ApexNav: Zero-Shot Object Navigation

RA-L 2025

Navigating unknown environments to find a target object is a significant challenge. While semantic information is crucial for navigation, relying solely on it for decision-making may not always be efficient, especially in environments with weak semantic cues. Additionally, many methods are susceptible to misdetections, especially in environments with visually similar objects. To address these limitations, we propose ApexNav, a zero-shot object navigation framework that is both more efficient and reliable. For efficiency, ApexNav adaptively utilizes semantic information by analyzing its distribution in the environment, guiding exploration through semantic reasoning when cues are strong, and switching to geometry-based exploration when they are weak. For reliability, we propose a target-centric semantic fusion method that preserves long-term memory of the target object and similar objects, reducing false detections and minimizing task failures. We evaluate ApexNav on the HM3Dv1, HM3Dv2, and MP3D datasets, where it outperforms state-of-the-art methods in both SR and SPL metrics. Comprehensive ablation studies further demonstrate the effectiveness of each module. Furthermore, real-world experiments validate the practicality of ApexNav in physical environments. [Video]

FERMI: Flexible Radio Mapping

RSS 2025

Communication is fundamental for multi-robot col-laboration, with accurate radio mapping playing a crucial role in predicting signal strength between robots. However, modeling radio signal propagation in large and occluded environments is challenging due to complex interactions between signals and obstacles. Existing methods face two key limitations: they struggle to predict signal strength for transmitter-receiver pairs not present in the training set, while also requiring extensive manual data collection for modeling, making them impractical for large, obstacle-rich scenarios. To overcome these limitations, we propose FERMI, a flexible radio mapping framework. FERMI combines physics-based modeling of direct signal paths with a neural network to capture environmental interactions with radio signals. This hybrid model learns radio signal propa- gation more efficiently, requiring only sparse training data. Additionally, FERMI introduces a scalable planning method for autonomous data collection using a multi-robot team. By increasing parallelism in data collection and minimizing robot travel costs between regions, overall data collection efficiency is significantly improved.

DynamicPose: 6D Object Pose Tracking

IROS 2025

We propose DynamicPose, a real-time and robust 6D object pose tracking framework that handles fast-moving camera and object without retraining. To ensure accurate translation initialization, we introduce an efficient translation compensation mechanism that corrects Region of Interest shifts caused by rapid camera or object motion. Additionally, we design a VIO-guided Kalman filter with dynamically scaled multi-candidate refinement, enabling robust 6D pose tracking even under extreme rotations. Extensive experiments show that DynamicPose outperforms existing state-of-the-art(SOTA) methods for 6D object pose tracking in fast-moving camera and object scenarios, where the relative motion between the target object and the camera exceeds 1.5m/s and 3.0rad/s. [Video]

Perception-aware Planning in Feature-limited Environments

IROS 2025

We propose a perception-aware planning method for quadrotor flight in unknown and feature-limited environments. Existing methods lack a systematic mechanism to allocate perception resources and efficiently integrate incrementally discovered features and unknown regions into planning, leading to collisions and high computation. We introduce a viewpoint transition graph to adaptively select local target viewpoints, guiding the UAV toward the goal while maintaining sufficient localizability and avoiding feature-limited regions. For trajectory generation, we construct localizable corridors via feature co-visibility evaluation as concise constraints, enabling efficient optimization that increases unknown information gain while preserving localization. Our method achieves faster and safer navigation with efficient replanning in unknown and feature-limited environments. [Video]

EPIC: Lightweight LiDAR-Based UAV Exploration

RA-L 2025

This paper presents EPIC, a lightweight LiDAR-based framework addressing challenges in UAV autonomous exploration. Traditional methods often require memory-heavy occupancy grids for frontier detection and struggle with computationally expensive path planning directly on point clouds. EPIC overcomes this by introducing a novel observation map based on point cloud quality, tracking well-observed versus poorly-observed areas using spatial hashing, thus eliminating global grids. It also features an incremental topological graph built directly on point clouds for efficient, real-time path planning. Combined in a hierarchical structure, these components enable agile, energy-efficient trajectories, achieving faster exploration with significantly reduced memory and computation compared to state-of-the-art methods in diverse environments. [Video:Bilibili]

SOAR: Simultaneous Exploration and Photographing

IROS 2024

Unmanned Aerial Vehicles (UAVs) have gained significant popularity in scene reconstruction. This paper presents SOAR, a LiDAR-Visual heterogeneous multi-UAV system specifically designed for fast autonomous reconstruction of complex environments. Our system comprises a LiDAR-equipped explorer with a large field-of-view (FoV), alongside photographers equipped with cameras. To ensure rapid acquisition of the scene’s surface geometry, we employ a surface frontier-based exploration strategy for the explorer. As the surface is progressively explored, we identify the uncovered areas and generate viewpoints incrementally. These viewpoints are then assigned to photographers through solving a Consistent Multiple Depot Multiple Traveling Salesman Problem (Consistent-MDMTSP), which optimizes scanning efficiency while ensuring task consistency. Finally, photographers utilize the assigned viewpoints to determine optimal coverage paths for acquiring images. We present extensive benchmarks in the realistic simulator, which validates the performance of SOAR compared with classical and state-of-the-art methods. [Video:Bilibili]

Star-Searcher: Autonomous Target Search

RA-L 2024

This paper tackles the challenge of autonomous target search using unmanned aerial vehicles (UAVs) in complex unknown environments. To fill the gap in systematic approaches for this task, we introduce Star-Searcher, an aerial system featuring specialized sensor suites, mapping, and planning modules to optimize searching. Path planning challenges due to increased inspection requirements are addressed through a hierarchical planner with a visibility-based viewpoint clustering method. This simplifies planning by breaking it into global and local sub-problems, ensuring efficient global and local path coverage in real time. Furthermore, our global path planning employs a history-aware mechanism to reduce motion inconsistency from frequent map changes, significantly enhancing search efficiency. [Video:Bilibili]

APACE: Agile and Perception-aware Trajectory Generation

ICRA 2024

Recently, we present APACE, an Agileand Perception-Aware trajeCtory gEneration framework for quadrotors aggressive flight, that takes into account feature matchability during trajectory planning. We seek to generatea perception-aware trajectory that reduces the error of visual-based estimator while satisfying the constraints on smoothness, safety, agility and the quadrotor dynamics. The perception objective is achieved by maximizing the number of covisible features while ensuring small enough parallax angles. Additionally, we propose a differentiable and accurate visibility model that allows decomposition of the trajectory planning problem for efficient optimization resolution (ICRA 2024 Submission). [Video:Bilibili]

Real-time Whole-body Motion Planning for Mobile Manipulators

ICRA 2024

Mobile manipulators have recently gained significant attention in the robotics community due to their superior potential in industrial and service applications. However, the high degree of freedom associated with mobile manipulatorsposes challenges in achieving realtime whole body motion planning. To bridge the gap, this paper presents a motion planning method capable of generating high-quality, safe, agileand feasible trajectories for mobile manipulators in real time (ICRA 2024 Submission). [Video:Bilibili]

FC-Planner: Skeleton-guided Aerial Coverage

ICRA 2024

Recently, we propose FC-Planner, a skeleton-guided planning framework that can achieve fast aerial coverage of complex 3D scenes without pre-processing. We decompose the scene into several simple subspaces by a skeleton-based space decomposition (SSD). Additionally, the skeleton guides us to effortlessly determine free space. We utilize the skeleton to efficiently generate a minimal set of specialized and informative viewpoints for complete coverage. Based on SSD, a hierarchical planner effectively divides the large planning problem into independent sub-problems, enabling parallel planning for each subspace. The carefully designed global and local planning strategies are then incorporated to guarantee both high quality and efficiency in path generation. We conduct extensive benchmark and realworld tests, where FC-Planner computes over 10 times faster compared to state-of-the-art methods with shorter path and more complete coverage (ICRA 2024 Submission). [Video:Bilibili]

MASSTAR: Multi-Modal Scene Dataset

IROS 2024

Recently, we propose MASSTAR: a multi-modal large-scale scene dataset with a versatile toolchain for surface prediction and completion. We collect a large amount of scene-level models including part of real-world captured data from a wide range of open-source works. A toolchain is also developed to facilitate processing the data by segmenting the raw 3D data and selecting the valuable model from raw 3D data and generating multi-modal data including RGB image, descriptive text, depth image, and partial point cloud. Additionally, we benchmark different algorithms trained on our dataset (ICRA 2024 Submission). [Video:Bilibili]

H2-Mapping: Real-time Dense Mapping

RA-L 2023

Recently, we propose a NeRF-based mapping method that enables higher-quality reconstruction and real-time capability even on edge computers of handheld devices and quadrotors by Chenxing JIANG and Hanwen ZHANG. Specifically, we propose a novel hierarchical hybrid representation and a coverage-maximizing keyframe selection strategy. Extensive experiments show our method achieves superior mapping results with less runtime compared to existing NeRF-based mapping methods. To the best of our knowledge, our method is the first to run a NeRF-based mapping method onboard in real-time.[Paper][Video:Bilibili][Video:Youtube][Code]

AutoTrans: Autonomous UAV Payload Transportation

RA-L 2023

Recently, we developed a real-time planning method for UAV payload system considering the time-varying shape and non-linear dynamics to ensure whole-body safety and dynamic feasibility by Haojia Li. Additionally, an adaptive NMPC with a hierarchical disturbance compensation strategy is designed to overcome unknown external perturbations and inaccurate model parameters. Extensive experiments show that our method is capable of generating high-quality trajectories online, even in highly constrained environments, and tracking aggressive flight trajectories accurately, even under significant uncertainty. [Video]

RACER: Rapid Collaborative Multi-UAV Exploration

T-RO 2023

Recently, we further develop a fully decentralized approach for exploration tasks using a fleet of quadrotors. The quadrotor team operates with asynchronous and limited communication, and does not require any central control. The coverage paths and workload allocations of the team are optimized and balanced in order to fully realize the system’s potential. The associated paper has been published at IEEE T-RO. [Paper][Video][Code]

PredRecon: Prediction-boosted Aerial Reconstruction

ICRA 2023

Our recent work toward fully automated and highly efficient aerial reconstruction, published at ICRA 2023, by Chen Feng. [Paper][Code][Video]