Difei Gao

Cited by

	All	Since 2019
Citations	770	767
h-index	15	15
i10-index	16	16

420

210

105

315

20182019202020212022202320242 3 3 28 71 259 401

Public access

View all

13 articles

1 article

available

not available

Based on funding mandates

Co-authors

Mike Z. SHOUNational U. of Singapore; Facebook AI; Columbia UniversityVerified email at columbia.edu
Qinghong LinNational U. of SingaporeVerified email at u.nus.edu
Ruiping WangProfessor, Institute of Computing Technology, Chinese Academy of SciencesVerified email at ict.ac.cn
Xilin ChenInstitute of Computing Technology, Chinese Academy of SciencesVerified email at ict.ac.cn
Joya ChenNational University of SingaporeVerified email at u.nus.edu
Shiguang ShanProfessor of Institute of Computing Technology, Chinese Academy of SciencesVerified email at ict.ac.cn
Yuxuan WangNanyang Technological University; National U. of SingaporeVerified email at ntu.edu.sg
Luowei ZhouResearch Scientist, Google DeepmindVerified email at google.com
Mengmi ZhangAssistant professor and PI of Deep NeuroCognition Lab, NTU and A*STARVerified email at ntu.edu.sg
Kenneth LiHarvard UniversityVerified email at g.harvard.edu
Lili PanAssociate Professor, University of Electronic Science and Technology of ChinaVerified email at uestc.edu.cn
Rui ChenUniversity of CambridgeVerified email at cam.ac.uk

Difei Gao

National U. of Singapore; Institute of Computing Technology, Chinese Academy of Sciences

Verified email at nus.edu.sg

Artificial Intelligence Vision and Language


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Egocentric video-language pretraining KQ Lin, AJ Wang, M Soldan, M Wray, R Yan, EZ Xu, D Gao, R Tu, W Zhao Neural Information Processing Systems (NeurIPS) 2 (3), 2022	131	2022
Multi-modal graph neural network for joint reasoning on vision and scene text D Gao, K Li, R Wang, S Shan, X Chen IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 12746 …, 2020	131	2020
Show-1: Marrying pixel and latent diffusion models for text-to-video generation DJ Zhang, JZ Wu, JW Liu, R Zhao, L Ran, Y Gu, D Gao, MZ Shou arXiv preprint arXiv:2309.15818, 2023	81	2023
MIST: Multi-modal Iterative Spatial-Temporal Transformer for Long-form Video Question Answering D Gao, L Zhou, L Ji, L Zhu, Y Yang, MZ Shou IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 14773 …, 2023	59	2023
UniVTG: Towards Unified Video-Language Temporal Grounding KQ Lin, P Zhang, J Chen, S Pramanick, D Gao, AJ Wang, R Yan, MZ Shou IEEE/CVF International Conference on Computer Vision (ICCV), 2023	46	2023
Assistgpt: A general multi-modal assistant that can plan, execute, inspect, and learn D Gao, L Ji, L Zhou, KQ Lin, J Chen, Z Fan, MZ Shou arXiv preprint arXiv:2306.08640, 2023	45	2023
CRIC: A vqa dataset for compositional reasoning on vision and commonsense D Gao, R Wang, S Shan, X Chen IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022	28*	2022
Env-QA: A Video Question Answering Benchmark for Comprehensive Understanding of Dynamic Environments D Gao, R Wang, Z Bai, X Chen IEEE/CVF International Conference on Computer Vision (ICCV), 1675-1685, 2021	25	2021
Weijie Kong, et al KQ Lin, AJ Wang, M Soldan, M Wray, R Yan, EZ Xu, D Gao, R Tu, W Zhao Egocentric video-language pretraining. NeurIPS 35 (7575-7586), 26, 2022	22	2022
AssistQ: Affordance-centric Question-driven Task Completion for Egocentric Assistant B Wong, J Chen, Y Wu, SW Lei, D Mao, D Gao, MZ Shou European Conference on Computer Vision (ECCV), 2022	22	2022
Symbolic replay: Scene graph as prompt for continual learning on vqa task SW Lei, D Gao, JZ Wu, Y Wang, W Liu, M Zhang, MZ Shou The AAAI Conference on Artificial Intelligence (AAAI), 2023	21	2023
GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval Y Wang, D Gao, L Yu, W Lei, M Feiszli, MZ Shou European Conference on Computer Vision (ECCV), 2022	21	2022
Cone: An efficient coarse-to-fine alignment framework for long video temporal grounding Z Hou, W Zhong, L Ji, D Gao, K Yan, WK Chan, CW Ngo, Z Shou, N Duan Annual Meeting of the Association for Computational Linguistics (ACL), 2022	20	2022
Affordance grounding from demonstration video to target image J Chen, D Gao, KQ Lin, MZ Shou IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 6799-6808, 2023	17	2023
Cvpr 2023 text guided video editing competition JZ Wu, X Li, D Gao, Z Dong, J Bai, A Singh, X Xiang, Y Li, Z Huang, Y Sun, ... arXiv preprint arXiv:2310.16003, 2023	16	2023
Learning to recognize visual concepts for visual question answering with structural label space D Gao, R Wang, S Shan, X Chen IEEE Journal of Selected Topics in Signal Processing (JSTSP) 14 (3), 494-505, 2020	12	2020
GroundNLQ@ Ego4D Natural Language Queries Challenge 2023 Z Hou, L Ji, D Gao, W Zhong, K Yan, C Li, WK Chan, CW Ngo, N Duan, ... arXiv preprint arXiv:2306.15255, 2023	9	2023
AssistGUI: Task-Oriented PC Graphical User Interface Automation D Gao, L Ji, Z Bai, M Ouyang, P Li, D Mao, Q Wu, W Zhang, P Wang, ... Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024	8*	2024
Assistsr: Task-oriented video segment retrieval for personal AI assistant SW Lei, D Gao, Y Wang, D Mao, Z Liang, L Ran, MZ Shou Findings of Empirical Methods in Natural Language Processing (EMNLP), 2021	8*	2021
An efficient coarse-to-fine alignment framework@ ego4d natural language queries challenge 2022 Z Hou, W Zhong, L Ji, D Gao, K Yan, WK Chan, CW Ngo, Z Shou, N Duan arXiv preprint arXiv:2211.08776, 2022	7	2022

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors