Follow
Tomek Korbak
Tomek Korbak
Other namesTomasz Korbak
UK AI Safety Institute
Verified email at dsit.gov.uk - Homepage
Title
Cited by
Cited by
Year
Open problems and fundamental limitations of reinforcement learning from human feedback
S Casper, X Davies, C Shi, TK Gilbert, J Scheurer, J Rando, R Freedman, ...
arXiv preprint arXiv:2307.15217, 2023
3962023
The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"
L Berglund, M Tong, M Kaufmann, M Balesni, AC Stickland, T Korbak, ...
arXiv preprint arXiv:2309.12288, 2023
204*2023
Pretraining language models with human preferences
T Korbak, K Shi, A Chen, RV Bhalerao, C Buckley, J Phang, SR Bowman, ...
International Conference on Machine Learning, 17506-17533, 2023
1792023
Towards understanding sycophancy in language models
M Sharma, M Tong, T Korbak, D Duvenaud, A Askell, SR Bowman, ...
arXiv preprint arXiv:2310.13548, 2023
144*2023
Inverse scaling: When bigger isn't better
IR McKenzie, A Lyzhov, M Pieler, A Parrish, A Mueller, A Prabhu, ...
arXiv preprint arXiv:2306.09479, 2023
118*2023
Training language models with language feedback at scale
J Scheurer, JA Campos, T Korbak, JS Chan, A Chen, K Cho, E Perez
arXiv preprint arXiv:2303.16755, 2023
972023
Foundational challenges in assuring alignment and safety of large language models
U Anwar, A Saparov, J Rando, D Paleka, M Turpin, P Hase, ES Lubana, ...
arXiv preprint arXiv:2404.09932, 2024
842024
Many-shot jailbreaking
C Anil, E Durmus, N Rimsky, M Sharma, J Benton, S Kundu, J Batson, ...
The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024
78*2024
Aligning language models with preferences through f-divergence minimization
D Go, T Korbak, G Kruszewski, J Rozen, N Ryu, M Dymetman
arXiv preprint arXiv:2302.08215, 2023
612023
Improving code generation by training with natural language feedback
A Chen, J Scheurer, T Korbak, JA Campos, JS Chan, SR Bowman, K Cho, ...
arXiv preprint arXiv:2303.16749, 2023
572023
RL with KL penalties is better viewed as Bayesian inference
T Korbak, E Perez, CL Buckley
arXiv preprint arXiv:2205.11275, 2022
512022
On reinforcement learning and distribution matching for fine-tuning language models with no catastrophic forgetting
T Korbak, H Elsahar, G Kruszewski, M Dymetman
Advances in Neural Information Processing Systems 35, 16203-16220, 2022
502022
Taken out of context: On measuring situational awareness in LLMs
L Berglund, AC Stickland, M Balesni, M Kaufmann, M Tong, T Korbak, ...
arXiv preprint arXiv:2309.00667, 2023
48*2023
Controlling conditional language models without catastrophic forgetting
T Korbak, H Elsahar, G Kruszewski, M Dymetman
International Conference on Machine Learning, 11499-11528, 2022
352022
Computational enactivism under the free energy principle
T Korbak
Synthese 198 (3), 2743-2763, 2021
342021
Is model collapse inevitable? breaking the curse of recursion by accumulating real and synthetic data
M Gerstgrasser, R Schaeffer, A Dey, R Rafailov, H Sleight, J Hughes, ...
arXiv preprint arXiv:2404.01413, 2024
27*2024
Interaction history as a source of compositionality in emergent communication
T Korbak, J Zubek, Ł Kuciński, P Miłoś, J Rączaszek-Leonardi
Interaction Studies 22 (2), 212-243, 2021
21*2021
Catalytic role of noise and necessity of inductive biases in the emergence of compositional communication
Ł Kuciński, T Korbak, P Kołodziej, P Miłoś
Advances in neural information processing systems 34, 23075-23088, 2021
18*2021
Measuring non-trivial compositionality in emergent communication
T Korbak, J Zubek, J Rączaszek-Leonardi
arXiv preprint arXiv:2010.15058, 2020
122020
Compositional preference models for aligning LMs
D Go, T Korbak, G Kruszewski, J Rozen, M Dymetman
arXiv preprint arXiv:2310.13011, 2023
102023
The system can't perform the operation now. Try again later.
Articles 1–20