Folgen
Zhihao Zhang
Zhihao Zhang
Bestätigte E-Mail-Adresse bei andrew.cmu.edu - Startseite
Titel
Zitiert von
Zitiert von
Jahr
Specinfer: Accelerating large language model serving with tree-based speculative inference and verification
X Miao, G Oliaro, Z Zhang, X Cheng, Z Wang, Z Zhang, RYY Wong, A Zhu, ...
Proceedings of the 29th ACM International Conference on Architectural …, 2024
2072024
Spatio-temporal graph dual-attention network for multi-agent prediction and tracking
J Li, H Ma, Z Zhang, J Li, M Tomizuka
IEEE Transactions on Intelligent Transportation Systems 23 (8), 10556-10569, 2021
168*2021
Towards efficient generative large language model serving: A survey from algorithms to systems
X Miao, G Oliaro, Z Zhang, X Cheng, H Jin, T Chen, Z Jia
arXiv preprint arXiv:2312.15234, 2023
732023
Gradsign: Model performance inference with theoretical insights
Z Zhang, Z Jia
arXiv preprint arXiv:2110.08616, 2021
322021
Accelerating retrieval-augmented language model serving with speculation
Z Zhang, A Zhu, L Yang, Y Xu, L Li, PM Phothilimthana, Z Jia
arXiv preprint arXiv:2401.14021, 2024
92024
Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Models
Z Zhang, D Zhao, X Miao, G Oliaro, Q Li, Y Jiang, Z Jia
arXiv preprint arXiv:2401.07159, 2024
72024
Tidaldecode: Fast and accurate llm decoding with position persistent sparse attention
L Yang, Z Zhang, Z Chen, Z Li, Z Jia
arXiv preprint arXiv:2410.05076, 2024
22024
AdaServe: SLO-Customized LLM Serving with Fine-Grained Speculative Decoding
Z Li, Z Chen, R Delacourt, G Oliaro, Z Wang, Q Chen, S Lin, A Yang, ...
arXiv preprint arXiv:2501.12162, 2025
2025
Communication Bounds for the Distributed Experts Problem
Z Jia, Q Pang, T Tran, D Woodruff, Z Zhang, W Zheng
arXiv preprint arXiv:2501.03132, 2025
2025
Das System kann den Vorgang jetzt nicht ausführen. Versuchen Sie es später erneut.
Artikel 1–9