Paper
Publication
Fair Normalizing Flows
Paper
Publication
Your Policy Regularizer is Secretly an Adversary
Paper
Publication
Defending Against Image Corruptions Through Adversarial Augmentations
Paper
Publication
Equilibrium Refinements for Multi-Agent Influence Diagrams: Theory and Practice
Paper
Publication
Enabling certification of verification-agnostic networks via memory-efficient semidefinite programming
Paper
Publication
Avoiding Side Effects By Considering Future Tasks
Paper
Publication
Uncovering the Limits of Adversarial Training against Norm-Bounded Adversarial Examples
Paper
Publication
Pessimism About Unknown Unknowns Inspires Conservatism
Paper
Publication
Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors
Paper
Publication
Pitfalls of learning a reward function online
Paper
Publication
An empirical investigation of the challenges of real-world reinforcement learning
Paper
Publication
The Incentives that Shape Behaviour
Paper
Publication
Artificial Intelligence, Values and Alignment
Paper
Publication
Deep Ensembles: A Loss Landscape Perspective
Paper
Publication
AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty
Paper
Publication
Towards Robust Image Classification Using Sequential Attention Models
Paper
Publication
An Alternative Surrogate Loss for PGD-based Adversarial Testing
Paper
Publication
Achieving Verified Robustness to Symbol Substitutions via Interval Bound Propagation
Paper
Publication
Reward Tampering Problems and Solutions in Reinforcement Learning: A Causal Influence Diagram Perspective
Paper
Publication
Adversarial Robustness through Local Linearization
Paper
Publication
Modeling AGI Safety Frameworks with Causal Influence Diagrams
Paper
Publication
Likelihood Ratios for Out-of-Distribution Detection
Paper
Publication
Can You Trust Your Model's Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift
Paper
Publication
Training verified learners with learned verifiers
Paper
Publication
Knowing When to Stop: Evaluation and Verification of Conformity to Output-size Specifications