Folgen
Chengruidong Zhang
Chengruidong Zhang
Research SDE, Microsoft
Bestätigte E-Mail-Adresse bei microsoft.com
Titel
Zitiert von
Zitiert von
Jahr
Phi-3 technical report: A highly capable language model locally on your phone
M Abdin, J Aneja, H Awadalla, A Awadallah, AA Awan, N Bach, A Bahree, ...
arXiv preprint arXiv:2404.14219, 2024
10012024
Longrope: Extending llm context window beyond 2 million tokens
Y Ding, LL Zhang, C Zhang, Y Xu, N Shang, J Xu, F Yang, M Yang
arXiv preprint arXiv:2402.13753, 2024
1322024
Minference 1.0: Accelerating pre-filling for long-context llms via dynamic sparse attention
H Jiang, Y Li, C Zhang, Q Wu, X Luo, S Ahn, Z Han, A Abdi, D Li, CY Lin, ...
Advances in Neural Information Processing Systems 37, 52481-52515, 2024
622024
Pit: Optimization of dynamic sparse deep learning models via permutation invariant transformation
N Zheng, H Jiang, Q Zhang, Z Han, L Ma, Y Yang, F Yang, C Zhang, L Qiu, ...
Proceedings of the 29th Symposium on Operating Systems Principles, 331-347, 2023
262023
Parrot: Efficient Serving of {LLM-based} Applications with Semantic Variable
C Lin, Z Han, C Zhang, Y Yang, F Yang, C Chen, L Qiu
18th USENIX Symposium on Operating Systems Design and Implementation (OSDI …, 2024
212024
Longrope: Extending llm context window beyond 2 million tokens, 2024
Y Ding, LL Zhang, C Zhang, Y Xu, N Shang, J Xu, F Yang, M Yang
URL https://arxiv. org/abs/2402.13753, 0
7
Grin: Gradient-informed moe
L Liu, YJ Kim, S Wang, C Liang, Y Shen, H Cheng, X Liu, M Tanaka, X Wu, ...
arXiv preprint arXiv:2409.12136, 2024
42024
{LitePred}: Transferable and Scalable Latency Prediction for {Hardware-Aware} Neural Architecture Search
C Feng, LL Zhang, Y Liu, J Xu, C Zhang, Z Wang, T Cao, M Yang, H Tan
21st USENIX Symposium on Networked Systems Design and Implementation (NSDI …, 2024
12024
Das System kann den Vorgang jetzt nicht ausführen. Versuchen Sie es später erneut.
Artikel 1–8