Longformer: The long-document transformer I Beltagy, ME Peters, A Cohan arXiv preprint arXiv:2004.05150, 2020 | 4273 | 2020 |
SciBERT: A Pretrained Language Model for Scientific Text I Beltagy, K Lo, A Cohan Proceedings of the 2019 Conference on Empirical Methods in Natural Language …, 2019 | 3917 | 2019 |
Don't stop pretraining: Adapt language models to domains and tasks S Gururangan, A Marasović, S Swayamdipta, K Lo, I Beltagy, D Downey, ... arXiv preprint arXiv:2004.10964, 2020 | 2361 | 2020 |
Bloom: A 176b-parameter open-access multilingual language model T Le Scao, A Fan, C Akiki, E Pavlick, S Ilić, D Hesslow, R Castagné, ... | 1589 | 2023 |
ScispaCy: fast and robust models for biomedical natural language processing M Neumann, D King, I Beltagy, W Ammar arXiv preprint arXiv:1902.07669, 2019 | 833 | 2019 |
Specter: Document-level representation learning using citation-informed transformers A Cohan, S Feldman, I Beltagy, D Downey, DS Weld arXiv preprint arXiv:2004.07180, 2020 | 535 | 2020 |
Construction of the literature graph in semantic scholar W Ammar, D Groeneveld, C Bhagavatula, I Beltagy, M Crawford, ... arXiv preprint arXiv:1805.02262, 2018 | 499 | 2018 |
How far can camels go? exploring the state of instruction tuning on open resources Y Wang, H Ivison, P Dasigi, J Hessel, T Khot, K Chandu, D Wadden, ... Advances in Neural Information Processing Systems 36, 74764-74786, 2023 | 239 | 2023 |
A dataset of information-seeking questions and answers anchored in research papers P Dasigi, K Lo, I Beltagy, A Cohan, NA Smith, M Gardner arXiv preprint arXiv:2105.03011, 2021 | 212 | 2021 |
PRIMERA: Pyramid-based masked sentence pre-training for multi-document summarization W Xiao, I Beltagy, G Carenini, A Cohan arXiv preprint arXiv:2110.08499, 2021 | 182 | 2021 |
What language model architecture and pretraining objective works best for zero-shot generalization? T Wang, A Roberts, D Hesslow, T Le Scao, HW Chung, I Beltagy, ... International Conference on Machine Learning, 22964-22984, 2022 | 161 | 2022 |
Machine learning for reliable mmwave systems: Blockage prediction and proactive handoff A Alkhateeb, I Beltagy, S Alex 2018 IEEE Global conference on signal and information processing (GlobalSIP …, 2018 | 159 | 2018 |
SciREX: A challenge dataset for document-level information extraction S Jain, M Van Zuylen, H Hajishirzi, I Beltagy arXiv preprint arXiv:2005.00512, 2020 | 158 | 2020 |
Pretrained language models for sequential sentence classification A Cohan, I Beltagy, D King, B Dalvi, DS Weld arXiv preprint arXiv:1909.04054, 2019 | 153 | 2019 |
Camels in a changing climate: Enhancing lm adaptation with tulu 2 H Ivison, Y Wang, V Pyatkin, N Lambert, M Peters, P Dasigi, J Jang, ... arXiv preprint arXiv:2311.10702, 2023 | 125 | 2023 |
Ms2: Multi-document summarization of medical studies J DeYoung, I Beltagy, M van Zuylen, B Kuehl, LL Wang arXiv preprint arXiv:2104.06486, 2021 | 115 | 2021 |
Flex: Unifying evaluation for few-shot nlp J Bragg, A Cohan, K Lo, I Beltagy Advances in Neural Information Processing Systems 34, 15787-15800, 2021 | 112 | 2021 |
Montague meets Markov: Deep semantics with probabilistic logical form I Beltagy, C Chau, G Boleda, D Garrette, K Erk, R Mooney 2nd Joint Conference on Lexical and Computational Semantics: *SEM 2013, 11-21, 2013 | 111 | 2013 |
What language model to train if you have one million gpu hours? TL Scao, T Wang, D Hesslow, L Saulnier, S Bekman, MS Bari, ... arXiv preprint arXiv:2210.15424, 2022 | 103 | 2022 |
Few-shot self-rationalization with natural language prompts A Marasović, I Beltagy, D Downey, ME Peters arXiv preprint arXiv:2111.08284, 2021 | 101 | 2021 |