Follow
Zhengyuan Yang
Zhengyuan Yang
Principal Researcher, Microsoft
Verified email at microsoft.com - Homepage
Title
Cited by
Cited by
Year
Git: A generative image-to-text transformer for vision and language
J Wang, Z Yang, X Hu, L Li, K Lin, Z Gan, Z Liu, C Liu, L Wang
Transactions on Machine Learning Research (TMLR), 2022
5342022
The dawn of lmms: Preliminary explorations with gpt-4v (ision)
Z Yang, L Li, K Lin, J Wang, CC Lin, Z Liu, L Wang
arXiv preprint arXiv:2309.17421 9 (1), 1, 2023
5142023
An empirical study of gpt-3 for few-shot knowledge-based vqa
Z Yang, Z Gan, J Wang, X Hu, Y Lu, Z Liu, L Wang
Proceedings of the AAAI conference on artificial intelligence 36 (3), 3081-3089, 2022
4092022
Mm-vet: Evaluating large multimodal models for integrated capabilities
W Yu, Z Yang, L Li, J Wang, K Lin, Z Liu, X Wang, L Wang
The 41st International Conference on Machine Learning (ICML), 2024
4052024
A fast and accurate one-stage approach to visual grounding
Z Yang, B Gong, L Wang, W Huang, D Yu, J Luo
IEEE International Conference on Computer Vision (ICCV), 4683-4693, 2019
3962019
TransVG: End-to-End Visual Grounding with Transformers
J Deng, Z Yang, T Chen, W Zhou, H Li
IEEE International Conference on Computer Vision (ICCV), 2021
3492021
Mm-react: Prompting chatgpt for multimodal reasoning and action
Z Yang, L Li, J Wang, K Lin, E Azarnasab, F Ahmed, Z Liu, C Liu, M Zeng, ...
arXiv preprint arXiv:2303.11381, 2023
3192023
Scaling up vision-language pre-training for image captioning
X Hu, Z Gan, J Wang, Z Yang, Z Liu, Y Lu, L Wang
Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2022
2912022
Improving One-stage Visual Grounding by Recursive Sub-query Construction
Z Yang, T Chen, L Wang, J Luo
European Conference on Computer Vision (ECCV), 2020
2542020
Prompting gpt-3 to be reliable
C Si, Z Gan, Z Yang, S Wang, J Wang, J Boyd-Graber, L Wang
International Conference on Learning Representations (ICLR 23), 2022
2372022
End-to-end multi-modal multi-task vehicle control for self-driving cars with visual perceptions
Z Yang, Y Zhang, J Yu, J Cai, J Luo
2018 24th international conference on pattern recognition (ICPR), 2289-2294, 2018
1982018
Action recognition with spatio–temporal visual attention on skeleton image sequences
Z Yang, Y Li, J Yang, J Luo
IEEE Transactions on Circuits and Systems for Video Technology 29 (8), 2405-2415, 2018
1912018
Multimodal foundation models: From specialists to general-purpose assistants
C Li, Z Gan, Z Yang, J Yang, L Li, L Wang, J Gao
Foundations and Trends® in Computer Graphics and Vision 16 (1-2), 1-214, 2024
1862024
Attentive relational networks for mapping images to scene graphs
M Qi, W Li, Z Yang, Y Wang, J Luo
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 3957-3966, 2019
1792019
TAP: Text-Aware Pre-training for Text-VQA and Text-Caption
Z Yang, Y Lu, J Wang, X Yin, D Florencio, L Wang, C Zhang, L Zhang, ...
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021
1732021
A Novel Graph-based Multi-modal Fusion Encoder for Neural Machine Translation
Y Yin, F Meng, J Su, C Zhou, Z Yang, J Zhou, J Luo
Annual Meeting of the Association for Computational Linguistics (ACL), 2020
1542020
UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling
Z Yang, Z Gan, J Wang, X Hu, F Ahmed, Z Liu, Y Lu, L Wang
European Conference on Computer Vision (ECCV), 521--539, 2022
151*2022
ReCo: Region-Controlled Text-to-Image Generation
Z Yang, J Wang, Z Gan, L Li, K Lin, C Wu, N Duan, Z Liu, C Liu, M Zeng, ...
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023
1302023
Promptcap: Prompt-guided image captioning for vqa with gpt-3
Y Hu, H Hua, Z Yang, W Shi, NA Smith, J Luo
Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023
126*2023
SAT: 2D Semantics Assisted Training for 3D Visual Grounding
Z Yang, S Zhang, L Wang, J Luo
IEEE International Conference on Computer Vision (ICCV), 2021
1122021
The system can't perform the operation now. Try again later.
Articles 1–20