Pre-trained models: Past, present and future X Han, Z Zhang, N Ding, Y Gu, X Liu, Y Huo, J Qiu, Y Yao, A Zhang, ... AI Open 2, 225-250, 2021 | 423 | 2021 |
Persistent b+-trees in non-volatile main memory S Chen, Q Jin Proceedings of the VLDB Endowment 8 (7), 786-797, 2015 | 389 | 2015 |
The SuperSID project: Exploiting high-level information for high-accuracy speaker recognition D Reynolds, W Andrews, J Campbell, J Navratil, B Peskin, A Adami, Q Jin, ... 2003 IEEE International Conference on Acoustics, Speech, and Signal …, 2003 | 352 | 2003 |
Fine-grained video-text retrieval with hierarchical graph reasoning S Chen, Y Zhao, Q Jin, Q Wu Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2020 | 274 | 2020 |
Say as you wish: Fine-grained control of image caption generation with abstract scene graphs S Chen, Q Jin, P Wang, Q Wu Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2020 | 199 | 2020 |
Speech emotion recognition with acoustic and lexical features Q Jin, C Li, S Chen, H Wu 2015 IEEE international conference on acoustics, speech and signal …, 2015 | 195 | 2015 |
Multimodal multi-task learning for dimensional and continuous emotion recognition S Chen, Q Jin, J Zhao, S Wang Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge, 19-26, 2017 | 157 | 2017 |
Multi-modal dimensional emotion recognition using recurrent neural networks S Chen, Q Jin Proceedings of the 5th International Workshop on Audio/Visual Emotion …, 2015 | 131 | 2015 |
Far-field speaker recognition Q Jin, T Schultz, A Waibel IEEE Transactions on Audio, Speech, and Language Processing 15 (7), 2023-2032, 2007 | 121 | 2007 |
Speaker segmentation and clustering in meetings. Q Jin, T Schultz INTERSPEECH 4, 597-600, 2004 | 117 | 2004 |
Describing videos using multi-modal fusion Q Jin, J Chen, S Chen, Y Xiong, A Hauptmann Proceedings of the 24th ACM international conference on Multimedia, 1087-1091, 2016 | 107 | 2016 |
WenLan: Bridging vision and language by large-scale multi-modal pre-training Y Huo, M Zhang, G Liu, H Lu, Y Gao, G Yang, J Wen, H Zhang, B Xu, ... arXiv preprint arXiv:2103.06561, 2021 | 99 | 2021 |
Mmgcn: Multimodal fusion via deep graph convolution network for emotion recognition in conversation J Hu, Y Liu, J Zhao, Q Jin arXiv preprint arXiv:2107.06779, 2021 | 89 | 2021 |
Speaker de-identification via voice transformation Q Jin, AR Toth, T Schultz, AW Black 2009 IEEE Workshop on Automatic Speech Recognition & Understanding, 529-533, 2009 | 81 | 2009 |
Is voice transformation a threat to speaker identification? Q Jin, AR Toth, AW Black, T Schultz 2008 IEEE International Conference on Acoustics, Speech and Signal …, 2008 | 78 | 2008 |
Multi-modal conditional attention fusion for dimensional emotion prediction S Chen, Q Jin Proceedings of the 24th ACM international conference on Multimedia, 571-575, 2016 | 73 | 2016 |
Video captioning with guidance of multimodal latent topics S Chen, J Chen, Q Jin, A Hauptmann Proceedings of the 25th ACM international conference on Multimedia, 1838-1846, 2017 | 69 | 2017 |
Phonetic speaker recognition using maximum-likelihood binary-decision tree models J Navrátil, Q Jin, WD Andrews, JP Campbell 2003 IEEE International Conference on Acoustics, Speech, and Signal …, 2003 | 69 | 2003 |
Event-based video retrieval using audio Q Jin, P Schulam, S Rawat, S Burger, D Ding, F Metze Thirteenth Annual Conference of the International Speech Communication …, 2012 | 67 | 2012 |
Application of LDA to speaker recognition. Q Jin, A Waibel Interspeech, 250-253, 2000 | 63 | 2000 |