A linear speedup analysis of distributed deep learning with sparse and quantized communication P Jiang, G Agrawal Advances in Neural Information Processing Systems 31, 2018 | 212 | 2018 |
A novel data transformation and execution strategy for accelerating sparse matrix multiplication on GPUs P Jiang, C Hong, G Agrawal Proceedings of the 25th ACM SIGPLAN symposium on principles and practice of …, 2020 | 49 | 2020 |
Exploiting recent simd architectural advances for irregular applications L Chen, P Jiang, G Agrawal Proceedings of the 2016 International Symposium on Code Generation and …, 2016 | 49 | 2016 |
Accelerating sparse cnn inference on gpus with performance-aware weight pruning MA Rumi, X Ma, Y Wang, P Jiang Proceedings of the ACM International Conference on Parallel Architectures …, 2020 | 29 | 2020 |
Combining SIMD and Many/Multi-core parallelism for finite state machines with enumerative speculation P Jiang, G Agrawal Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of …, 2017 | 28 | 2017 |
Reusing data reorganization for efficient simd parallelization of adaptive irregular applications P Jiang, L Chen, G Agrawal Proceedings of the 2016 International Conference on Supercomputing, 1-10, 2016 | 25 | 2016 |
Efficient SIMD and MIMD parallelization of hash-based aggregation by conflict mitigation P Jiang, G Agrawal Proceedings of the International Conference on Supercomputing, 1-11, 2017 | 24 | 2017 |
Conflict-free vectorization of associative irregular applications with recent SIMD architectural advances P Jiang, G Agrawal Proceedings of the 2018 International Symposium on Code Generation and …, 2018 | 15 | 2018 |
Exposing and exploiting fine-grained block structures for fast and accurate sparse training P Jiang, L Hu, S Song Advances in Neural Information Processing Systems 35, 38345-38357, 2022 | 13 | 2022 |
Revealing parallel scans and reductions in recurrences through function reconstruction P Jiang, L Chen, G Agrawal Proceedings of the 27th International Conference on Parallel Architectures …, 2018 | 12 | 2018 |
Rethinking graph data placement for graph neural network training on multiple GPUs S Song, P Jiang Proceedings of the 36th ACM International Conference on Supercomputing, 1-10, 2022 | 10 | 2022 |
A methodology for characterizing sparse datasets and its application to simd performance prediction G Zhu, P Jiang, G Agrawal 2019 28th International Conference on Parallel Architectures and Compilation …, 2019 | 8 | 2019 |
Accelerating distributed stochastic gradient descent with adaptive periodic parameter averaging: Poster P Jiang, G Agrawal Proceedings of the 24th Symposium on Principles and Practice of Parallel …, 2019 | 8 | 2019 |
Communication-efficient sampling for distributed training of graph convolutional networks P Jiang, MA Rumi arXiv preprint arXiv:2101.07706, 2021 | 6 | 2021 |
Scaling out speculative execution of finite-state machines with parallel merge Y Xia, P Jiang, G Agrawal Proceedings of the 25th ACM SIGPLAN Symposium on principles and practice of …, 2020 | 6 | 2020 |
Enabling prefix sum parallelism pattern for recurrences with principled function reconstruction Y Xia, P Jiang, G Agrawal Proceedings of the 28th International Conference on Compiler Construction, 17-28, 2019 | 6 | 2019 |
Scaling and selecting gpu methods for all pairs shortest paths (apsp) computations Y Xia, P Jiang, G Agrawal, R Ramnath 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS …, 2022 | 4 | 2022 |
Exploring pim architecture for high-performance graph pattern mining J Su, L He, P Jiang, R Wang IEEE computer architecture letters 20 (2), 114-117, 2021 | 4 | 2021 |
Scaling sparse matrix multiplication on cpu-gpu nodes Y Xia, P Jiang, G Agrawal, R Ramnath 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS …, 2021 | 4 | 2021 |
End-to-End LU Factorization of Large Matrices on GPUs Y Xia, P Jiang, G Agrawal, R Ramnath Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and …, 2023 | 3 | 2023 |