Folgen
Xiaoxia (Shirley) Wu 吴晓霞
Xiaoxia (Shirley) Wu 吴晓霞
Sonstige NamenXiaoxia Wu
Bestätigte E-Mail-Adresse bei microsoft.com - Startseite
Titel
Zitiert von
Zitiert von
Jahr
Phi-3 technical report: A highly capable language model locally on your phone
M Abdin, J Aneja, H Awadalla, A Awadallah, AA Awan, N Bach, A Bahree, ...
arXiv preprint arXiv:2404.14219, 2024
6752024
Adagrad stepsizes: Sharp convergence over nonconvex landscapes
R Ward, X Wu, L Bottou
Journal of Machine Learning Research 21 (219), 1-30, 2020
3702020
Zeroquant: Efficient and affordable post-training quantization for large-scale transformers
Z Yao, R Yazdani Aminabadi, M Zhang, X Wu, C Li, Y He
Advances in Neural Information Processing Systems 35, 27168-27183, 2022
3692022
When do curricula work?
X Wu, E Dyer, B Neyshabur
arXiv preprint arXiv:2012.03107, 2020
1402020
Wngrad: Learn the learning rate in gradient descent
X Wu, R Ward, L Bottou
arXiv preprint arXiv:1803.02865, 2018
972018
Adagrad stepsizes: Sharp convergence over nonconvex landscapes, from any initialization
R Ward, X Wu, L Bottou
arXiv preprint arXiv:1806.01811, 2018
892018
Zeroquant-v2: Exploring post-training quantization in llms from comprehensive study to low rank compensation
Z Yao, X Wu, C Li, S Youn, Y He
arXiv preprint arXiv:2303.08302, 2023
79*2023
Global convergence of adaptive gradient methods for an over-parameterized neural network
X Wu, SS Du, R Ward
arXiv preprint arXiv:1902.07111, 2019
712019
Hierarchical learning for generation with long source sequences
T Rohde, X Wu, Y Liu
arXiv preprint arXiv:2104.07545, 2021
622021
Linear convergence of adaptive stochastic gradient descent
Y Xie, X Wu, R Ward
International conference on artificial intelligence and statistics, 1475-1485, 2020
612020
Deepspeed-chat: Easy, fast and affordable rlhf training of chatgpt-like models at all scales
Z Yao, RY Aminabadi, O Ruwase, S Rajbhandari, X Wu, AA Awan, ...
arXiv preprint arXiv:2308.01320, 2023
542023
Choosing the sample with lowest loss makes sgd robust
V Shah, X Wu, S Sanghavi
International Conference on Artificial Intelligence and Statistics, 2120-2130, 2020
512020
Zero++: Extremely efficient collective communication for giant model training
G Wang, H Qin, SA Jacobs, C Holmes, S Rajbhandari, O Ruwase, F Yan, ...
arXiv preprint arXiv:2306.10209, 2023
402023
Understanding int4 quantization for transformer models: Latency speedup, composability, and failure cases
X Wu, C Li, RY Aminabadi, Z Yao, Y He
arXiv preprint arXiv:2301.12017, 2023
40*2023
Implicit regularization and convergence for weight normalization
X Wu, E Dobriban, T Ren, S Wu, Z Li, S Gunasekar, R Ward, Q Liu
Advances in Neural Information Processing Systems 33, 2835-2847, 2020
31*2020
Value-at-Risk estimation with stochastic interest rate models for option-bond portfolios
X Wang, D Xie, J Jiang, X Wu, J He
Finance Research Letters 21, 10-20, 2017
312017
Xtc: Extreme compression for pre-trained transformers made simple and efficient
X Wu, Z Yao, M Zhang, C Li, Y He
Advances in Neural Information Processing Systems 35, 3217-3231, 2022
272022
Zeroquant-fp: A leap forward in llms post-training w4a8 quantization using floating-point formats
X Wu, Z Yao, Y He
arXiv preprint arXiv:2307.09782, 2023
252023
Renaissance: A survey into ai text-to-image generation in the era of large model
F Bie, Y Yang, Z Zhou, A Ghanem, M Zhang, Z Yao, X Wu, C Holmes, ...
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024
202024
Exploring post-training quantization in llms from comprehensive study to low rank compensation
Z Yao, X Wu, C Li, S Youn, Y He
Proceedings of the AAAI Conference on Artificial Intelligence 38 (17), 19377 …, 2024
192024
Das System kann den Vorgang jetzt nicht ausführen. Versuchen Sie es später erneut.
Artikel 1–20