Gemini: a family of highly capable multimodal models G Team, R Anil, S Borgeaud, JB Alayrac, J Yu, R Soricut, J Schalkwyk, ... arXiv preprint arXiv:2312.11805, 2023 | 2146 | 2023 |
Gpipe: Efficient training of giant neural networks using pipeline parallelism Y Huang, Y Cheng, A Bapna, O Firat, D Chen, M Chen, HJ Lee, J Ngiam, ... Advances in neural information processing systems 32, 2019 | 1779 | 2019 |
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context G Team, P Georgiev, VI Lei, R Burnell, L Bai, A Gulati, G Tanzer, ... arXiv preprint arXiv:2403.05530, 2024 | 671* | 2024 |
The best of both worlds: Combining recent advances in neural machine translation MX Chen, O Firat, A Bapna, M Johnson, W Macherey, G Foster, L Jones, ... arXiv preprint arXiv:1804.09849, 2018 | 538 | 2018 |
Massively multilingual neural machine translation in the wild: Findings and challenges N Arivazhagan, A Bapna, O Firat, D Lepikhin, M Johnson, M Krikun, ... arXiv preprint arXiv:1907.05019, 2019 | 418 | 2019 |
Gmail Smart Compose: Real-Time Assisted Writing MX Chen, BN Lee, G Bansal, Y Cao, S Zhang, J Lu, J Tsay, Y Wang, ... Proceedings of the 25th ACM SIGKDD International Conference on Knowledge …, 2019 | 244 | 2019 |
Lingvo: a modular and scalable framework for sequence-to-sequence modeling J Shen, P Nguyen, Y Wu, Z Chen, MX Chen, Y Jia, A Kannan, T Sainath, ... arXiv preprint arXiv:1902.08295, 2019 | 212 | 2019 |
Training deeper neural machine translation models with transparent attention A Bapna, MX Chen, O Firat, Y Cao, Y Wu arXiv preprint arXiv:1808.07561, 2018 | 124 | 2018 |
Building machine translation systems for the next thousand languages A Bapna, I Caswell, J Kreutzer, O Firat, D van Esch, A Siddhant, M Niu, ... arXiv preprint arXiv:2205.03983, 2022 | 81 | 2022 |
Leveraging monolingual data with self-supervision for multilingual neural machine translation A Siddhant, A Bapna, Y Cao, O Firat, M Chen, S Kudugunta, ... arXiv preprint arXiv:2005.04816, 2020 | 81 | 2020 |
Unsupervised deep haar scattering on graphs X Chen, X Cheng, S Mallat Advances in Neural Information Processing Systems 27, 2014 | 64 | 2014 |
Predicting a user's next cell with supervised learning based on channel states X Chen, F Mériaux, S Valentin 2013 IEEE 14th workshop on signal processing advances in wireless …, 2013 | 57 | 2013 |
Deep Haar scattering networks X Cheng, X Chen, S Mallat Information and Inference: A Journal of the IMA 5 (2), 105-133, 2016 | 44 | 2016 |
Towards the next 1000 languages in multilingual machine translation: Exploring the synergy between supervised and self-supervised learning A Siddhant, A Bapna, O Firat, Y Cao, MX Chen, I Caswell, X Garcia arXiv preprint arXiv:2201.03110, 2022 | 32 | 2022 |
Music genre classification using multiscale scattering and sparse representations X Chen, PJ Ramadge 2013 47th Annual Conference on Information Sciences and Systems (CISS), 1-6, 2013 | 31 | 2013 |
Towards end-to-end in-image neural machine translation E Mansimov, M Stern, M Chen, O Firat, J Uszkoreit, P Jain arXiv preprint arXiv:2010.10648, 2020 | 24 | 2020 |
Faster transformer decoding: N-gram masked self-attention C Chelba, M Chen, A Bapna, N Shazeer arXiv preprint arXiv:2001.04589, 2020 | 17 | 2020 |
Collaborative representation, sparsity or nonlinearity: What is key to dictionary based classification? X Chen, PJ Ramadge 2014 IEEE International Conference on Acoustics, Speech and Signal …, 2014 | 14 | 2014 |
Rapid domain adaptation for machine translation with monolingual data M Mahdieh, MX Chen, Y Cao, O Firat arXiv preprint arXiv:2010.12652, 2020 | 8 | 2020 |
GPipe: Easy scaling with micro-batch pipeline parallelism Y Huang, Y Cheng, A Bapna, O Firat, MX Chen, D Chen, HJ Lee, J Ngiam, ... proceeding of Computer Science> Computer Vision and Pattern Recognition, 2019 | 8 | 2019 |