Paper
Publication
Pretraining the Noisy Channel Model for Task-Oriented Dialogue
Paper
Publication
Data, Architecture, or Losses: What Contributes Most to Multimodal Transformer Success?
Paper
Publication
Adaptive Semiparametric Language Models