We present an end-to-end differentiable training method for retrieval-augmented open-domain question answering systems that combine information from multiple retrieved documents when generating answers. We show that exact end-to-end training of the answer generator (i.e., the reader) and the retriever in such a system is intractable since it involves marginalizing over a set of retrieved documents. We propose an alternative training method that can be seen as an instance of the expectation-maximization algorithm. We iteratively estimate the value of our latent variable (the set of relevant documents for a given question) and use this estimate to update the retriever and reader parameters. We hypothesize that such an end-to-end training allows training signals to flow to the reader and then to the retriever better. This results in an improved retriever that is able to select more relevant documents for a question and a reader that is trained on more accurate evidence documents to generate an answer. Experiments on three benchmark datasets demonstrate that our proposed method outperforms all competing baselines of comparable size, achieving new state-of-the-art results. While our experimental focus is on open-domain question answering, we believe that our model can be applied to other text generation tasks as well.