HuggingFace's Transformers: State-of-the-art natural language processing T Wolf, L Debut, V Sanh, J Chaumond, C Delangue, A Moi, P Cistac, ... arXiv preprint arXiv:1910.03771, 2019 | 3369 | 2019 |
Multitask prompted training enables zero-shot task generalization V Sanh, A Webson, C Raffel, SH Bach, L Sutawika, Z Alyafeai, A Chaffin, ... arXiv preprint arXiv:2110.08207, 2021 | 1658 | 2021 |
Bloom: A 176b-parameter open-access multilingual language model TL Scao, A Fan, C Akiki, E Pavlick, S Ilić, D Hesslow, R Castagné, ... arXiv preprint arXiv:2211.05100, 2022 | 1628 | 2022 |
Crosslingual generalization through multitask finetuning N Muennighoff, T Wang, L Sutawika, A Roberts, S Biderman, TL Scao, ... arXiv preprint arXiv:2211.01786, 2022 | 617 | 2022 |
How Many Data Points is a Prompt Worth? T Le Scao, AM Rush arXiv e-prints, arXiv: 2103.08493, 2021 | 291* | 2021 |
Datasets: A community library for natural language processing Q Lhoest, AV del Moral, Y Jernite, A Thakur, P von Platen, S Patil, ... arXiv preprint arXiv:2109.02846, 2021 | 273 | 2021 |
Scaling Data-Constrained Language Models N Muennighoff, AM Rush, B Barak, TL Scao, A Piktus, N Tazi, S Pyysalo, ... arXiv preprint arXiv:2305.16264, 2023 | 202 | 2023 |
The bigscience roots corpus: A 1.6 tb composite multilingual dataset H Laurençon, L Saulnier, T Wang, C Akiki, A Villanova del Moral, ... Advances in Neural Information Processing Systems 35, 31809-31826, 2022 | 168 | 2022 |
What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization? T Wang, A Roberts, D Hesslow, TL Scao, HW Chung, I Beltagy, J Launay, ... arXiv preprint arXiv:2204.05832, 2022 | 167 | 2022 |
What Language Model to Train if You Have One Million GPU Hours? T Le Scao, T Wang, D Hesslow, L Saulnier, S Bekman, MS Bari, ... Challenges {\&, 2022 | 104 | 2022 |
Neural Differential Equations for Single Image Super-Resolution T Le Scao ICLR 2020 Workshop on Integration of Deep Neural Models and Differential …, 2020 | 3 | 2020 |
Joint Representations of Text and Knowledge Graphs for Retrieval and Evaluation TL Scao, C Gardent arXiv preprint arXiv:2302.14785, 2023 | 2 | 2023 |
In-training Matrix Factorization for Parameter-frugal Neural Machine Translation Z Kaden, TL Scao, R Olivier arXiv preprint arXiv:1910.06393, 2019 | 1 | 2019 |