Characteristics of Harmful Text: Towards Rigorous Benchmarking of Language Models
Maribeth Rauh,
John Mellor,
Jonathan Uesato,
Po-Sen Huang,
Johannes Welbl,
Laura Weidinger,
Sumanth Dathathri,
Mia Glaese,
Geoffrey Irving,
Iason Gabriel,
William Isaac,
Lisa Anne Hendricks
NeurIPS Datasets and Benchmarks Track
2022-11-29