Dense linear algebra solvers for multicore with GPU accelerators S Tomov, R Nath, H Ltaief, J Dongarra 2010 IEEE International Symposium on Parallel & Distributed Processing …, 2010 | 324 | 2010 |
An improved magma gemm for fermi graphics processing units R Nath, S Tomov, J Dongarra The International Journal of High Performance Computing Applications 24 (4 …, 2010 | 289 | 2010 |
Accelerating GPU kernels for dense linear algebra R Nath, S Tomov, J Dongarra High Performance Computing for Computational Science–VECPAR 2010, 83-92, 2011 | 100 | 2011 |
Accelerating the reduction to upper Hessenberg, tridiagonal, and bidiagonal forms through hybrid GPU-based computing S Tomov, R Nath, J Dongarra Parallel Computing 36 (12), 645-654, 2010 | 83 | 2010 |
Optimizing Symmetric Dense Matrix-Vector Multiplication on GPUs R Nath, S Tomov, J Dongarra Super Computing (SC), 2011 | 74 | 2011 |
A scalable high performant Cholesky factorization for multicore with GPU accelerators H Ltaief, S Tomov, R Nath, P Du, J Dongarra High Performance Computing for Computational Science–VECPAR 2010: 9th …, 2011 | 73 | 2011 |
MAGMA users’ guide S Tomov, R Nath, P Du, J Dongarra ICL, UTK (November 2009), 2011 | 50 | 2011 |
JETC: Joint energy thermal and cooling management for memory and CPU subsystems in servers R Ayoub, R Nath, T Rosing IEEE International Symposium on High-Performance Comp Architecture, 1-12, 2012 | 47 | 2012 |
MAGMA version 0.2 User Guide S Tomov, R Nath, P Du, J Dongarra | 44 | 2009 |
The CRISP performance model for dynamic voltage and frequency scaling in a GPGPU R Nath, D Tullsen Proceedings of the 48th international symposium on microarchitecture, 281-293, 2015 | 38 | 2015 |
An implementation of the tile QR factorization for a GPU and multiple CPUs J Kurzak, R Nath, P Du, J Dongarra PARA, 2010 | 26 | 2010 |
Hybrid multicore cholesky factorization with multiple gpu accelerators H Ltaief, S Tomov, R Nath, J Dongarra IEEE Transaction on Parallel and Distributed Systems 48, 2010 | 26 | 2010 |
A fully empirical autotuned dense QR factorization for multicore architectures E Agullo, J Dongarra, R Nath, S Tomov Euro-Par 2011 Parallel Processing, 194-205, 2011 | 23 | 2011 |
Accelerating ML recommendation with over a thousand RISC-V/tensor processors on Esperanto’s ET-SoC-1 chip D Ditzel, R Espasa, N Aymerich, A Baum, T Berg, J Burr, E Hao, J Iyer, ... 2021 IEEE Hot Chips 33 Symposium (HCS), 1-23, 2021 | 21 | 2021 |
Temperature aware thread block scheduling in GPGPUs R Nath, R Ayoub, TS Rosing Proceedings of the 50th Annual Design Automation Conference, 1-6, 2013 | 11 | 2013 |
BLAS for GPUs R Nath, S Tomov, J Dongarra | 10 | 2010 |
CoMETC: Coordinated management of energy/thermal/cooling in servers R Ayoub, R Nath, TS Rosing ACM Transactions on Design Automation of Electronic Systems (TODAES) 19 (1 …, 2013 | 9 | 2013 |
Magma, matrix algebra on gpu and multicore architectures S Tomov, R Nath, P Du, J Dongarra | 7 | 2012 |
Power Modeling and Thermal Management Techniques for Manycores R Nath, D Carmean, T Rosing | 6 | 2013 |
Fully empirical autotuned qr factorization for multicore architectures E Agullo, J Dongarra, R Nath, S Tomov arXiv preprint arXiv:1102.5328, 2011 | 6 | 2011 |