Research
Blog
Impact
Safety & Ethics
About
Careers
Research
Publications
Authors' notes
Open source
Highlighted Research
AlphaFold
AlphaGo
WaveNet
Blog
Applied
Company
Ethics and Society
Events
Open source
Research
Teams
Research
Applied
Engineering
Ethics & Society
Operations
Science
About
Impact
Safety & Ethics
Careers
Scholarships
Learning resources
The Podcast
Press
Terms and conditions
Privacy policy
Modern Slavery Statement
Alphabet Inc.
Publications
Safety
View all publications
Publication
Fair Normalizing Flows
Mislav Balunovic *, Anian Ruoss, Martin Vechev *
ICLR
2022-04-25
Verification-fairness-interpretability
Safety
Download
Publication
Your Policy Regularizer is Secretly an Adversary
Rob Brekelmans *, Tim Genewein, Jordi Grau, Grégoire Delétang, Markus Kunesch, Shane Legg, Pedro Ortega
arXiv
2022-03-23
Safety
Reinforcement learning
Theory & foundations
Control & robotics
Download
Publication
Defending Against Image Corruptions Through Adversarial Augmentations
Dan A. Calian, Florian Stimberg, Olivia Wiles, Sylvestre-Alvise Rebuffi, Andras Gyorgy, Timothy Mann, Sven Gowal
arXiv
2021-04-02
Deep learning
Safety
Download
Publication
Equilibrium Refinements for Multi-Agent Influence Diagrams: Theory and Practice
James Fox *, Lewis Hammond *, Tom Everitt, Alessandro Abate *, Michael Wooldridge *
AAMAS
2021-02-09
Theory & foundations
Games
Safety
Download
Publication
Enabling certification of verification-agnostic networks via memory-efficient semidefinite programming
Dj Dvijotham, Jonathan Uesato, Sumanth Dathathri, Rudy Bunel, Pushmeet Kohli
NeurIPS
2020-10-22
Verification-fairness-interpretability
Optimisation
Safety
Download
Publication
Avoiding Side Effects By Considering Future Tasks
Victoria Krakovna, Laurent Orseau, Richard Ngo, Miljan Martic, Shane Legg
NeurIPS
2020-10-15
Safety
Reinforcement learning
Theory & foundations
Download
Publication
Uncovering the Limits of Adversarial Training against Norm-Bounded Adversarial Examples
Sven Gowal, Chongli Qin, Jonathan Uesato, Timothy Mann, Pushmeet Kohli
arXiv
2020-10-07
Safety
Deep learning
Download
Publication
Pessimism About Unknown Unknowns Inspires Conservatism
Michael Cohen *, Marcus Hutter
arXiv
2020-06-15
Safety
Reinforcement learning
Planning
Download
Publication
Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors
Mike Dusenberry *, G Jerfel *, Y Wen *, Y Ma *, Jasper Snoek *, Katherine Heller *, Balaji Lakshminarayanan, D Tran *
ICML
2020-05-14
Probabilistic learning
Deep learning
Safety
Download
Publication
Pitfalls of learning a reward function online
Stuart Armstrong *, Jan Leike, Laurent Orseau, Shane Legg
arXiv
2020-04-28
Safety
Reinforcement learning
Theory & foundations
Download
Publication
An empirical investigation of the challenges of real-world reinforcement learning
Gabriel Dulac-Arnold *, Nir Levine, Daniel Mankowitz, Jerry Li, Cosmin Paduraru, Sven Gowal, Todd Hester
arXiv
2020-03-24
Reinforcement learning
Environments
Safety
Control & robotics
Open Source Software
Download
Publication
The Incentives that Shape Behaviour
Ryan Carey *, Eric Langlois, Tom Everitt, Shane Legg
arXiv
2020-01-20
Safety
Causal inference
Theory & foundations
Verification-fairness-interpretability
Download
Publication
Artificial Intelligence, Values and Alignment
Iason Gabriel
arXiv
2020-01-13
Safety
Download
Publication
Deep Ensembles: A Loss Landscape Perspective
Stanislav Fort *, Huiyi Hu *, Balaji Lakshminarayanan
arXiv
2019-12-05
Deep learning
Probabilistic learning
Safety
Download
Publication
AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty
Dan Hendrycks, Norman Mu *, E Cubuk *, Barret Zoph *, Justin Gilmer *, Balaji Lakshminarayanan
arXiv
2019-12-05
Safety
Deep learning
Download
Publication
Towards Robust Image Classification Using Sequential Attention Models
Daniel Zoran, M Chrzanowski *, Po-Sen Huang, Sven Gowal, Alex Mott, Pushmeet Kohli
arXiv
2019-12-04
Vision
Representation learning
Safety
Download
Publication
An Alternative Surrogate Loss for PGD-based Adversarial Testing
Sven Gowal, Jonathan Uesato, Chongli Qin, Po-Sen Huang, Timothy Mann, Pushmeet Kohli
arXiv
2019-10-21
Verification-fairness-interpretability
Safety
Optimisation
Deep learning
Download
Publication
Achieving Verified Robustness to Symbol Substitutions via Interval Bound Propagation
Po-Sen Huang, Robert Stanforth, Johannes Welbl *, Chris Dyer, Dani Yogatama, Sven Gowal, Krishnamurthy (Dj) Dvijotham, Pushmeet Kohli
EMNLP
2019-09-03
Language
Verification-fairness-interpretability
Deep learning
Safety
Download
Publication
Reward Tampering Problems and Solutions in Reinforcement Learning: A Causal Influence Diagram Perspective
Tom Everitt, Marcus Hutter
arXiv
2019-08-13
Safety
Theory & foundations
Reinforcement learning
Causal inference
Download
Publication
Adversarial Robustness through Local Linearization
Chongli Qin, James Martens, Sven Gowal, Dilip Krishnan *, Krishnamurthy (Dj) Dvijotham, Alhussein Fawzi, Soham De, Robert Stanforth, Pushmeet Kohli
arXiv
2019-07-04
Safety
Vision
Deep learning
Representation learning
Download
Publication
Modeling AGI Safety Frameworks with Causal Influence Diagrams
Tom Everitt, Ramana Kumar, Victoria Krakovna, Shane Legg
IJCAI
2019-06-20
Safety
Theory & foundations
Reinforcement learning
Download
Publication
Likelihood Ratios for Out-of-Distribution Detection
J Ren *, P Liu *, Emily Fertig *, Jasper Snoek *, Ryan Poplin *, M DePristo *, J Dillon *, Balaji Lakshminarayanan
arXiv
2019-06-07
Deep learning
Probabilistic learning
Safety
Unsupervised learning & generative models
Download
Publication
Can You Trust Your Model's Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift
Yaniv Ovadia *, Emily Fertig *, Z Nado *, N Nowozin *, J Dillon *, Balaji Lakshminarayanan, Jasper Snoek *
arXiv
2019-06-06
Deep learning
Safety
Probabilistic learning
Download
Publication
Training verified learners with learned verifiers
Krishnamurthy (Dj) Dvijotham, Sven Gowal, Robert Stanforth, Relja Arandjelovic, Brendan O'Donoghue, Jonathan Uesato, Pushmeet Kohli
arXiv
2019-05-29
Safety
Download
Publication
Knowing When to Stop: Evaluation and Verification of Conformity to Output-size Specifications
Chenglong Wang, Rudy Bunel *, Krishnamurthy (Dj) Dvijotham, Po-Sen Huang, Edward Grefenstette, Pushmeet Kohli
arXiv
2019-04-26
Verification-fairness-interpretability
Optimisation
Deep learning
Safety
Theory & foundations
Download
1
...
Next
1 / 3