Wei Yang

Wei Yang (杨卫)

Associate Professor
School of Computer Science & Technology, HUST
Room 306, Southern Building #6

I am an Associate Professor at the School of Computer Science, Huazhong University of Science and Technology, where I co-lead the HUST Media Lab. My research interests primarily lie in the areas of imaging, graphics, computer vision, and artificial intelligence.

Before joining HUST, I worked in the Advanced Technology and Projects (ATAP) division at Google in Mountain View, USA. As a proud member of the Te'veren team, I collaborated with Rick Marks on advanced sensing and on-device intelligence using computer vision. Prior to Google, I served as a Principal Scientist at DGene, US, where I conducted research on real-time volumetric human capture systems.

I graduated from the University of Delaware in 2017, where I majored in Computer Science. At UDel, I worked with Professor Jingyi Yu on research problems in computational photography and scene understanding. During my PhD, I interned at Adobe, hosted by the ACR team, in 2015.

I am actively seeking creative and highly motivated MS and PhD students who are passionate about research.

Email / Google Scholar

Services

Area Chair: ICML 25, ICCV 25, CVPR 25, NeurIPS 24, CVPR 24, NeurIPS 23, CVPR 23

Program Committee: AAAI 20, AAAI 21, WACV 21, BMVC 18

Reviewer: CVPR, ICCV, NeurIPS, ICML, ICLR, ECCV, TPAMI, TIP, TVCG...

Research and Publications

^† denotes corresponding author

	Ref-GS: Directional Factorization for 2D Gaussian Splatting Youjia Zhang, Anpei Chen^†, Yumin Wan, Zikai Song, Junqing Yu, Yawei Luo Wei Yang^†, CVPR, 2025 Paper / Project Ref-GS builds upon the deferred rendering of Gaussian splatting and applies directional encoding to the deferred-rendered surface, effectively reducing the ambiguity between orientation and viewing angle. We introduce a spherical Mip-grid to capture varying levels of surface roughness, enabling roughness-aware Gaussian shading.
	SF2T: Self-supervised Fragment Finetuning of Video-LLMs for Fine-Grained Understanding Yangliu Hu, Zikai Song^†, Na Feng, Yawei Luo Junqing Yu Yi-Ping Phoebe Chen, Wei Yang^† CVPR, 2025 Paper / Project We propose the Self-Supervised Fragment Fine-Tuning (SF^2T), a novel effortless fine-tuning method, employs the rich inherent characteristics of videos for training, while unlocking more fine-grained understanding ability of Video-LLMs, along with a novel benchmark dataset, namely FineVidBench, for rigorously assessing Video-LLMs' performance at both the scene and fragment levels, offering a comprehensive evaluation of their capabilities.
	CLAY: A Controllable Large-scale Generative Model for Creating High-quality 3D Assets Longwen Zhang, Ziyu Wang, Qixuan Zhang, Qiwei Qiu, Anqi Pang, Haoran Jiang Wei Yang, Lan Xu, Jingyi Yu SIGGRAPH, 2024 (Honorable Mentions) Project / Paper / Website We introduce CLAY, a 3D geometry and material generator designed to effortlessly transform human imagination into intricate 3D digital structures. CLAY supports classic text or image inputs as well as 3D-aware controls from diverse primitives (multi-view images, voxels, bounding boxes, point clouds, implicit representations, etc).
	Video Anomaly Detection with Motion and Appearance Guided Patch Diffusion Model Hang Zhou, Jiale Cai, Yuteng Ye, Yonghui Feng, Chenxing Gao, Junqing Yu, Zikai Song^†, Wei Yang ACM MM, 2024 Paper We introduce innovative motion and appearance conditions that are seamlessly integrated into a patch diffusion model to guide the model in generating coherent and contextually appropriate predictions for both semantic content and motion relations.
	Autogenic Language Embedding for Coherent Point Tracking Zikai Song, Ying Tang, Run Luo, Lintao Ma, Junqing Yu^†, Yi-Ping Phoebe Chen, Wei Yang^† ACM MM, 2024 Paper / Project ALTrack is a coherent point tracking framework which designes an autogenic language embedding for visual feature enhancement, strengthens point correspondence in long-term sequences. Unlike existing visual-language schemes, our approach learns text embeddings from visual features through a dedicated mapping network, enabling seamless adaptation to various tracking tasks without explicit text annotations.
	Coupled Mamba: Enhanced Multi-modal Fusion with Coupled State Space Model Wenbing Li, Hang Zhou, Junqing Yu Zikai Song^†, Wei Yang^† NeurIPS, 2024 Paper / Code We propose the Coupled SSM model, which links the state chains of multiple modalities while keeping intra-modality processes independent. Our model includes an inter-modal state transition mechanism, where the current state depends on both its own chain and neighboring chains' states from the previous time step.
	TIGER: Text-Instructed 3D Gaussian Retrieval and Coherent Editing Teng Xu, Jiamin Chen, Peng Chen, Youjia Zhang, Junqing Yu, Wei Yang^† ArXiv, 2024 Project / Paper We propose a Coherent Score Distillation (CSD) that aggregates a 2D image editing diffusion model and a multi-view diffusion model for score distillation, producing multi-view consistent editing with much finer details.
	AMD: Anatomical Motion Diffusion with Interpretable Motion Decomposition and Fusion Beibei Jing, Youjia Zhang, Zikai Song, Junqing Yu, Wei Yang^† AAAI, 2024 Paper / Code Comming Soon We propose the Adaptable Motion Diffusion (AMD) model, which leverages a Large Language Model (LLM) to parse the input text into a sequence of concise and interpretable anatomical scripts that correspond to the target motion.
	Progressive Text-to-Image Diffusion with Soft Latent Direction Yuteng Ye, Jiale Cai, Hang Zhou, Guanwen Li, Youjia Zhang, Zikai Song, Chenxing Gao, Junqing Yu, Wei Yang^† AAAI, 2024 Paper / Code We propose to harness the capabilities of a Large Language Model (LLM) to decompose text descriptions into coherent directives adhering to stringent formats and progressively generate the target image.
	Attacking Transformers with Feature Diversity Adversarial Perturbation Chenxing Gao, Hang Zhou, Junqing Yu, YuTeng Ye, Jiale Cai, Junle Wang, Wei Yang^† AAAI, 2024 Paper Cooming Soon We present a label-free white-box attack approach for ViT-based models that exhibits strong transferability to various black-box models by accelerating the feature collapse.
	Dynamic Feature Pruning and Consolidation for Occluded Person Re-Identification Yuteng Ye, Hang Zhou, Junqing Yu, Qiang Hu, Wei Yang^† AAAI, 2024 Paper / Code We propose a Feature Pruning and Consolidation (FPC) framework to circumvent explicit human structure parse, which consists of a sparse encoder, a global and local feature ranking module, and a feature consolidation decoder.
	DiffusionTrack: Diffusion Model For Multi-Object Tracking Run Luo, Zikai Song, Lintao Ma, Jinlin Wei, Wei Yang, Min Yang AAAI, 2024 Paper / Code We formulates object detection and association jointly as a consistent denoising diffusion process from paired noise boxes to paired ground-truth boxes.
	NeMF: Inverse Volume Rendering with Neural Microflake Field Youjia Zhang, Teng Xu, Junqing Yu, Yuteng Ye, Yanqing Jing, Junle Wang, Jingyi Yu Wei Yang^† ICCV, 2023 arXiv / Project Page We propose to conduct inverse volume rendering by representing a scene using microflake volume, which assumes the space is filled with infinite small flakes and light reflects or scattersat each spatial location according to microflake distributions.
	C2F2NeUS: Cascade Cost Frustum Fusion for High Fidelity and Generalizable Neural Surface Reconstruction Luoyuan Xu, Tao Guan, Yuesong Wang, Wenkai Liu, Zhaojie Zeng, Junle Wang, Wei Yang ICCV, 2023 Paper We propose to construct per-view cost frustum and fuse cross-view frustums for finer geometry estimation.
	DreamFace: Progressive Generation of Animatable 3D Faces under Text Guidance Longwen Zhang, Qiwei Qiu, Hongyang Lin, Qixuan Zhang, Cheng Shi, Wei Yang, Ye Shi, Sibei Yang, Lan Xu, Jingyi Yu SIGGRAPH, 2023 arXiv / Project Page / Video / Web Demo / Huggingface Space DreamFace is a progressive scheme to generate personalized 3D faces under text guidance.
	HACK: Learning a Parametric Head and Neck Model for High-fidelity Animation Longwen Zhang, Zijun Zhao, Xinzhou Cong, Qixuan Zhang, Shuqi Gu, Yuchong Gao, Rui Zheng, Wei Yang, Lan Xu, Jingyi Yu SIGGRAPH, 2023 arXiv / Project Page / Video We introduce HACK (Head-And-neCK), a novel parametric model for constructing the head and cervical region of digital humans.
	Adaptive Patch Deformation for Textureless-Resilient Multi-View Stereo Yuesong Wang, Zhaojie Zeng, Tao Guan, Wei Yang, Zhuo Chen, Wenkai Liu, Luoyuan Xu, Yawei Luo CVPR, 2023 Paper / Code We transplant the spirit of deformable convolution into the PatchMatch-based method for both memory-friendly and textureless-resilient MVS.
	Compact Transformer Tracker with Correlative Masked Modeling Zikai Song, Run Luo, Junqing Yu^†, Yi-Ping Phoebe Chen, Wei Yang^† AAAI, 2023 (Oral Presentation) arXiv / Code We demonstrate the basic vision transformer (ViT) architecture is sufficient for visual tracking with correlative masked modeling for information aggregation enhancement.
	Dual Memory Units with Uncertainty Regulation for Weakly Supervised Video Anomaly Detection Hang Zhou, Junqing Yu^†, Wei Yang^† AAAI, 2023 (Oral Presentation) arXiv / Code We propose an Uncertainty Regulated Dual Memory Units (UR-DMU) model to learn both the representations of normal data and discriminative features of abnormal data.
	HumanNeRF: Efficiently Generated Human Radiance Field from Sparse Inputs Fuqiang Zhao, Wei Yang, Jiakai Zhang, Pei Lin, Yingliang Zhang, Jingyi Yu, Lan Xu CVPR, 2022 Project Page / arXiv / Code / Video We present a neural representation with efficient generalization ability for high-fidelity free-view synthesis of dynamic humans.
	Transformer Tracking With Cyclic Shifting Window Attention Zikai Song, Junqing Yu, Yi-Ping Phoebe Chen, Wei Yang CVPR, 2022 arXiv / Code CSWinTT is a new transformer architecture with multi-scale cyclic shifting window attention for visual object tracking, elevating the attention from pixel to window level.
	Artemis: Articulated Neural Pets with Appearance and Motion Synthesis Haimin Luo, Teng Xu, Yuheng Jiang, Chenglin Zhou, Qiwei Qiu, Yingliang Zhang, Wei Yang, Lan Xu, Jingyi Yu SIGGRAPH, 2022 Project Page / arXiv / Code / Video ARTEMIS, the core of which is a neural-generated (NGI) animal engine, enables interactive motion control, real-time animation and photo-realistic rendering of furry animals.
	NeuVV: Neural Volumetric Videos with Immersive Rendering and Editing Jiakai Zhang, Liao Wang, Xinhang Liu, Fuqiang Zhao, Minzhang Li, Haizhao Dai, Boyuan Zhang, Wei Yang, Lan Xu, Jingyi Yu arXiv preprint, 2202.06088 arXiv NeuVV introduces a hyper-spherical harmonics (HH) decomposition for modeling smooth color variations over space and time.
	Self-Supervised Multi-view Stereo via Adjacent Geometry Guided Volume Completion Luoyuan Xu, Tao Guan, Yuesong Wang, Yawei Luo Zhuo Chen, Wenkai Liu, Wei Yang ACM Multimedia, 2022 Paper We propose the AGG-CVCNet to learn complete geometry inference from partial observations with high confidence.
	Video-driven Neural Physically-based Facial Asset for Production Longwen Zhang, Chuxiao Zeng, Qixuan Zhang, Hongyang Lin, Ruixiang Cao, Wei Yang, Lan Xu, Jingyi Yu SIGGRAPH Asia, 2022 Project Page / arXiv / Video We present a learning-based, video-driven approach for generating dynamic facial geometries with high-quality physically-based assets
	Tightcap: 3D human shape capture with clothing tightness field Xin Chen, Anqi Pang, Wei Yang, Peihao Wang, Lan Xu, Jingyi Yu SIGGRAPH, 2022 (TOG) Project Page / arXiv / Code / Video TightCap is a data-driven scheme to capture both the human shape and dressed garments accurately with only a single 3D human scan.
	SportsCap: Monocular 3D Human Motion Capture and Fine-grained Understanding in Challenging Sports Videos Xin Chen, Anqi Pang, Wei Yang, Yuexin Ma, Lan Xu, Jingyi Yu IJCV, 2022 (TOG) Project Page / arXiv / Code / Video SportsCap -- the first approach for simultaneously capturing 3D human motions and understanding fine-grained actions from monocular challenging sports video input
	Structure From Motion on XSlit Cameras Wei Yang, Yingliang Zhang, Jinwei Ye, Yu Ji, Zhong Li, Mingyuan Zhou, Jingyi Yu TPAMI, 2019 Paper We present a structure-from-motion (SfM) framework based on a special type of multi-perspective camera called the cross-slit or XSlit camera.
	Robust 3D Human Motion Reconstruction via Dynamic Template Construction Zhong Li, Yu Ji, Wei Yang, Jinwei Ye, Jingyi Yu 3DV, 2017 (Spotlight Oral Presentation) arXiv / Video / Data We generate a global full-body template by registering all poses in the acquired motion sequence, and then construct a deformable graph by utilizing the rigid components in the global template.
	The Light Field 3D Scanner Yingliang Zhang, Zhong Li, Wei Yang, Peihong Yu, Haiting Lin, Jingyi Yu ICCP, 2017 Paper We use the light field (LF) camera such as Lytro and Raytrix as a virtual 3D scanner.
	Ray Space Features for Plenoptic Structure-from-Motion Yingliang Zhang, Peihong Yu, Wei Yang, Yuanxi Ma, Jingyi Yu ICCV, 2017 Paper We present a comprehensive theory on ray geometry transforms under light field pose variations, and derive the transforms of three typical ray manifolds.
	Resolving Scale Ambiguity via XSlit Aspect Ratio Analysis Wei Yang, Haiting Lin, Sing Bing Kang, Jingyi Yu ICCV, 2015 Paper We present the depth dependent aspecratio (DDAR) property that can be used to 3D recovery.
	Ambient Occlusion via Compressive Visibility Estimation Wei Yang, Yu Ji, Haiting Lin, Yang Yang, Sing Bing Kang, Jingyi Yu CVPR, 2015 Paper We present a novel computational imaging solution for recovering AO by adopting a compressive sensing framework.
	Depth-of-field and Coded Aperture Imaging on Xslit Lens Jinwei Ye, Yu Ji, Wei Yang, Jingyi Yu ECCV, 2014 (Oral Presentation) Paper / Video We explore coded aperture solutions on a special non-centric lens called the crossedslit (XSlit) lens.
	Coplanar Common Points in Non-centric Cameras Wei Yang, Yu Ji, Jinwei Ye, S. Susan Young, Jingyi Yu ECCV, 2014 Paper We address the problem of determining CCP existence in general non-centric cameras.

I borrowed this website template from Jon Barron, thanks!