Paper
Publication
Distilling Internet-Scale Vision-Language Models into Embodied Agents
Paper
Publication
Latent Space Smoothing for Individually Fair Representations
Paper
Publication
Self-supervised video pretraining yields strong image representations
Paper
Publication
Where Should I Spend My FLOPS? Efficiency Evaluations of Visual Pre-training Methods
Paper
Publication
Object discovery and representation networks
Paper
Publication
Evaluating the Adversarial Robustness of Adaptive Test-time Defenses
Paper
Publication
BYOL-Explore: Exploration with Bootstrapped Prediction
Paper
Publication
Unlocking High-Accuracy Differentially Private Image Classification through Scale
Paper
Publication
Perceiver AR: general-purpose, long-context autoregressive generation
Paper
Publication
Measuring CLEVRness: Black-box Testing of Visual Reasoning Models
Paper
Publication
More than Words: In-the-Wild Visually-Driven Prosody for Text-to-Speech
Paper
Publication
Efficient Visual Pretraining with Contrastive Detection
Paper
Publication
Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers
Paper
Publication
Data, Architecture, or Losses: What Contributes Most to Multimodal Transformer Success?
Paper
Publication
BYOL works even without batch statistics
Paper
Publication
Bootstrap Your Own Latent: A new approach to self-supervised learning
Paper
Publication
S3K: Self-Supervised Semantic Keypoints for Robotic Manipulation via Multi-View Consistency
Paper
Publication
Counting Out Time: Class Agnostic Video Repetition Counting in the Wild
Paper
Publication
Efficient Neighbourhood Consensus Networks via Submanifold Sparse Convolutions
Paper
Publication
Evolving Normalization-Activation Layers
Paper
Publication
Hybrid Models for Open Set Recognition
Paper
Publication
Batch Normalization Biases Deep Residual Networks Towards Shallow Paths
Paper
Publication
Sideways: Depth-Parallel Training of Video Models
Paper
Publication
International evaluation of an AI system for breast cancer screening
Paper
Publication
End-to-End Learning of Visual Representations from Uncurated Instructional Videos