Efficient memory management for large language model serving with pagedattention W Kwon, Z Li, S Zhuang, Y Sheng, L Zheng, CH Yu, J Gonzalez, H Zhang, ... Proceedings of the 29th Symposium on Operating Systems Principles, 611-626, 2023 | 282 | 2023 |
Graphene: Strong yet lightweight row hammer protection Y Park, W Kwon, E Lee, TJ Ham, JH Ahn, JW Lee 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture …, 2020 | 94 | 2020 |
Learned Token Pruning for Transformers S Kim, S Shen, D Thorsley, A Gholami, W Kwon, J Hassoun, K Keutzer Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and …, 2022 | 89 | 2022 |
A Fast Post-Training Pruning Framework for Transformers W Kwon, S Kim, MW Mahoney, J Hassoun, K Keutzer, A Gholami Advances in Neural Information Processing Systems 35 (NeurIPS 2022), 2022 | 68 | 2022 |
Nimble: Lightweight and parallel gpu task scheduling for deep learning W Kwon, GI Yu, E Jeong, BG Chun Advances in Neural Information Processing Systems 33 (NeurIPS 2020), 2020 | 58 | 2020 |
SkyPilot: An Intercloud Broker for Sky Computing Z Yang, Z Wu, M Luo, WL Chiang, R Bhardwaj, W Kwon, S Zhuang, ... 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI …, 2023 | 33 | 2023 |
vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention W Kwon, Z Li, S Zhuang, Y Sheng, L Zheng, CH Yu, J Gonzalez, H Zhang, ... Blog post at https://vllm.ai, 2023 | 4 | 2023 |
Hammer refresh row address detector, and semiconductor memory device and memory module including the same H Shin, Y Park, J Lee, LEE Eojin, W Kwon, J Ahn, HAM Taejun US Patent 11,568,917, 2023 | | 2023 |
Learned threshold token pruning for transformer neural networks DPL Thorsley, S Shen, SH Kim, A Gholaminejad, W Kwon, J Hassoun, ... US Patent App. 17/578,435, 2022 | | 2022 |
Method and apparatus for lightweight and parallelization of accelerator task scheduling BG Chun, YU Gyeongin, W Kwon US Patent App. 17/524,869, 2022 | | 2022 |