Referring expression comprehension: A survey of methods and datasets Y Qiao, C Deng, Q Wu IEEE Transactions on Multimedia 23, 4426-4440, 2020 | 93 | 2020 |
HOP: History-and-Order Aware Pre-training for Vision-and-Language Navigation Y Qiao, Y Qi, Y Hong, Z Yu, P Wang, Q Wu Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022 | 78 | 2022 |
Hop+: History-enhanced and order-aware pre-training for vision-and-language navigation Y Qiao, Y Qi, Y Hong, Z Yu, P Wang, Q Wu IEEE Transactions on Pattern Analysis and Machine Intelligence 45 (7), 8524-8537, 2023 | 44 | 2023 |
VL-Mamba: Exploring State Space Models for Multimodal Learning Y Qiao, Z Yu, Z Zhao, S Chen, M Sun, L Guo, Q Wu, J Liu NeurIPS Workshop on Efficient Natural Language and Speech Processing, 2024 | 40 | 2024 |
Improving visual question answering using dropout and enhanced question encoder Z Fang, J Liu, Y Li, Y Qiao, H Lu Pattern Recognition 90, 404-414, 2019 | 34 | 2019 |
March in Chat: Interactive Prompting for Remote Embodied Referring Expression Y Qiao, Y Qi, Z Yu, J Liu, Q Wu Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023 | 28 | 2023 |
R-GAN: Exploring Human-like Way for Reasonable Text-to-Image Synthesis via Generative Adversarial Networks Y Qiao, Q Chen, C Deng, N Ding, Y Qi, M Tan, X Ren, Q Wu Proceedings of the 29th ACM International Conference on Multimedia, 2085-2093, 2021 | 20 | 2021 |
VLN-PETL: Parameter-Efficient Transfer Learning for Vision-and-Language Navigation Y Qiao, Z Yu, Q Wu Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023 | 12 | 2023 |
Rankvqa: Answer re-ranking for visual question answering Y Qiao, Z Yu, J Liu 2020 IEEE International Conference on Multimedia and Expo (ICME), 1-6, 2020 | 12 | 2020 |
VC-VQA: visual calibration mechanism for visual question answering Y Qiao, Z Yu, J Liu 2020 IEEE International Conference on Image Processing (ICIP), 1481-1485, 2020 | 8 | 2020 |
Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models Y Zhang, Z Ma, J Li, Y Qiao, Z Wang, J Chai, Q Wu, M Bansal, ... Transactions on Machine Learning Research (TMLR), 2024 | 6 | 2024 |
Enhancing visual question answering using dropout Z Fang, J Liu, Y Qiao, Q Tang, Y Li, H Lu Proceedings of the 26th ACM international conference on Multimedia, 1002-1010, 2018 | 5 | 2018 |
Multi-modal Adapter for Medical Vision-and-Language Learning Z Yu, Y Qiao, Y Xie, Q Wu International Workshop on Machine Learning in Medical Imaging, 393-402, 2023 | 1 | 2023 |
LLM as Copilot for Coarse-Grained Vision-and-Language Navigation Y Qiao, Q Liu, J Liu, J Liu, Q Wu European Conference on Computer Vision ECCV 2024 15063, 459-476, 2024 | | 2024 |
Effective Tuning Strategies for Generalist Robot Manipulation Policies W Zhang, Y Li, Y Qiao, S Huang, J Liu, F Dayoub, X Ma, L Liu arXiv preprint arXiv:2410.01220, 2024 | | 2024 |
Open-Nav: Exploring Zero-Shot Vision-and-Language Navigation in Continuous Environment with Open-Source LLMs Y Qiao, W Lyu, H Wang, Z Wang, Z Li, Y Zhang, M Tan, Q Wu arXiv preprint arXiv:2409.18794, 2024 | | 2024 |
MiniVLN: Efficient Vision-and-Language Navigation by Progressive Knowledge Distillation J Zhu, Y Qiao, S Zhang, X He, Q Wu, J Liu arXiv preprint arXiv:2409.18800, 2024 | | 2024 |
MM-LDM: Multi-Modal Latent Diffusion Model for Sounding Video Generation M Sun, W Wang, Y Qiao, J Sun, Z Qin, L Guo, X Zhu, J Liu ACM Multimedia 2024, 2024 | | 2024 |
Improving Online Source-free Domain Adaptation for Object Detection by Unsupervised Data Acquisition X Shi, Y Qiao, Q Wu, L Liu, F Dayoub ECCV Workshop on ROAM 2024, 2023 | | 2023 |
General Vision and Language Methods in Real Applications: A Focus on Vision-and-Language Navigation Y Qiao PhD Thesis, 2023 | | 2023 |