Paper
Publication
Evaluating the Adversarial Robustness of Adaptive Test-time Defenses
Paper
Publication
Fair Normalizing Flows
Paper
Publication
Your Policy Regularizer is Secretly an Adversary
Paper
Publication
Defending Against Image Corruptions Through Adversarial Augmentations
Paper
Publication
Equilibrium Refinements for Multi-Agent Influence Diagrams: Theory and Practice
Paper
Publication
Enabling certification of verification-agnostic networks via memory-efficient semidefinite programming
Paper
Publication
Avoiding Side Effects By Considering Future Tasks
Paper
Publication
Uncovering the Limits of Adversarial Training against Norm-Bounded Adversarial Examples
Paper
Publication
Pessimism About Unknown Unknowns Inspires Conservatism
Paper
Publication
Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors
Paper
Publication
Pitfalls of learning a reward function online
Paper
Publication
An empirical investigation of the challenges of real-world reinforcement learning
Paper
Publication
The Incentives that Shape Behaviour
Paper
Publication
Artificial Intelligence, Values and Alignment
Paper
Publication
Deep Ensembles: A Loss Landscape Perspective
Paper
Publication
AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty
Paper
Publication
Towards Robust Image Classification Using Sequential Attention Models
Paper
Publication
An Alternative Surrogate Loss for PGD-based Adversarial Testing
Paper
Publication
Achieving Verified Robustness to Symbol Substitutions via Interval Bound Propagation
Paper
Publication
Reward Tampering Problems and Solutions in Reinforcement Learning: A Causal Influence Diagram Perspective
Paper
Publication
Adversarial Robustness through Local Linearization
Paper
Publication
Modeling AGI Safety Frameworks with Causal Influence Diagrams
Paper
Publication
Likelihood Ratios for Out-of-Distribution Detection
Paper
Publication
Can You Trust Your Model's Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift
Paper
Publication
Training verified learners with learned verifiers