Jiazhao Zhang | 张嘉曌
I am a Ph.D. student at the Center on Frontiers of Computing Studies at Peking University, where I have been advised by Prof. He Wang since 2022. Prior to this, I earned my M.S. degree from the National University of Defense Technology (NUDT), under the supervision of Prof. Kai Xu. I received my B.Eng. degree from Shandong University.
My research goal is to develop intelligent and practical robots to enhance people's daily lives. My current research focuses on building intelligent navigation robots based on vision-language models. I am also interested in scene reconstruction and understanding, including techniques such as SLAM and segmentation.
Email  / 
Google Scholar  / 
Github
|
|
Research
*: equal contribution; †: corresponding author(s)
|
|
NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation
Jiazhao Zhang*, Kunyu Wang* ,Rongtao Xu* ,Gengze Zhou ,Yicong Hong
,Xiaomeng Fang ,Qi Wu ,Zhizheng Zhang† ,He Wang†
RSS 2024
Paper
/
Webpage
NaVid makes the first endeavour to showcase the capability of VLMs to achieve state-of-the-art level navigation performance without any maps, odometer and depth inputs. Following human instruction, NaVid only requires an on-the-fly video stream from a monocular RGB camera equipped on the robot to output the next-step action.
|
|
MaskClustering: View Consensus based Mask Graph Clustering for Open-Vocabulary 3D Instance Segmentation
Mi Yan, Jiazhao Zhang, Yan Zhu, He Wang†
CVPR 2024
Paper
/
Code
/
Webpage
We propose a robust zero-shot 3D instance segmentation method that leverages the 3D view consensus of 2D candidate masks. Our method can integrate with a 2D visual foundation model (e.g., CLIP) to achieve open-vocabulary 3D instance segmentation.
|
|
GAMMA: Graspability-Aware Mobile MAnipulation
Policy Learning based on Online Grasping Pose Fusion
Jiazhao Zhang*, Nandiraju Gireesh*, Jilong Wang, Xiaomeng Fang, Chaoyi Xu, Weiguang Chen, Liu Dai, He Wang†
ICRA 2024
Paper
/
Code
/
Webpage
We propose a graspability-aware mobile manipulation approach powered by an online grasping pose fusion framework that enables a temporally consistent grasping observation.
|
|
MIPS-Fusion: Multi-Implicit-Submaps for Scalable and Robust Online Neural RGB-D Reconstruction
Yijie Tang*, Jiazhao Zhang*, Zhinan Yu, He Wang, Kai Xu†
ACM Transactions on Graphics (SIGGRAPH Asia 2023)
Paper
/
Code
We introduce MIPS-Fusion, a robust and scalable online RGB-D reconstruction method based on a novel neural implicit representation – multi-implicit-submap.
|
|
3D-Aware Object Goal Navigation via Simultaneous Exploration and Identification
Jiazhao Zhang* , Liu Dai*, Fanpeng Meng, Qingnan Fan, Xuelin Chen, Kai Xu, He Wang†
CVPR 2023
Paper
/
Code
/
Webpage
We propose a framework for the challenging 3D-aware ObjectNav based on two straightforward sub-policies, namely corner-guided exploration policy and category-aware identification policy.
|
|
GraspNeRF: Multiview-based 6-DoF Grasp Detection for Transparent and Specular Objects Using Generalizable NeRF
Qiyu Dai*,
Yan Zhu*, Yiran Geng, Ciyu Ruan, Jiazhao Zhang, He Wang†
ICRA 2023
Paper
/
Code & Data
/
Webpage
|
|
Tracking and Reconstructing Hand Object Interactions from Point Cloud Sequences in the Wild
Jiayi Chen*,
Mi Yan*,
Jiazhao Zhang,
Yinzhen Xu,
Xiaolong Li,
Yijiang Weng,
Li Yi,
Shuran Song,
He Wang†
AAAI 2023 (Oral Presentation)
Paper
/
Code & Data
/
Webpage
|
|
ASRO-DIO: Active Subspace Random Optimization Based Depth Inertial Odometry
Jiazhao Zhang, Yijie Tang,
He Wang,
Kai Xu†
Transactions on Robotics (T-RO 2022)
Paper
/
Contact me for code permission
To realize efficient random optimization in the 18D state space of IMU tracking, we propose to identify and sample particles from active subspace.
|
|
ROSEFusion: Random Optimization for Online Dense Reconstruction under Fast Camera Motion
Jiazhao Zhang,
Chenyang Zhu,
Lintao Zheng,
Kai Xu†
ACM Transactions on Graphics (SIGGRAPH 2021)
Paper
/
Code & Data
We propose to tackle the difficulties of fast-motion camera tracking in the absence of inertial measurements using random optimization.
|
|
Fusion-Aware Point Convolution for Online Semantic 3D Scene Segmentation
Jiazhao Zhang*,
Chenyang Zhu*,
Lintao Zheng,
Kai Xu†
CVPR 2020
Paper
/
Code & Data
We propose a novel fusionaware 3D point convolution which operates directly on the geometric surface being reconstructed and exploits effectively the inter-frame correlation for high quality 3D feature learning.
|
|
Active Scene Understanding via Online Semantic Reconstruction
Lintao Zheng, Chenyang Zhu, Jiazhao Zhang, Hang Zhao, Hui Huang, Matthias Niessner, Kai Xu†
Computer Graphics Forum (Pacific Graphics 2019)
Paper
We propose a novel approach to robot-operated active understanding of unknown indoor scenes, based on online RGBD reconstruction with semantic segmentation.
|
Peking University, Teaching Assistant, Computer Vision, Spring 2022
NUDT, Teaching Assistant, Computer Vision, Spring 2021
NUDT, Teaching Assistant, Computer Vision, Spring 2020
|
|