Challenges in Detoxifying Language Models
Johannes Welbl,
Mia Glaese,
Jonathan Uesato,
Sumanth Dathathri,
John Mellor,
Lisa Anne Hendricks,
Kirsty Anderson *,
Pushmeet Kohli,
Ben Coppin,
Po-Sen Huang
Findings of EMNLP
2021-09-15
Verification-fairness-interpretability