We address the problem of Federated Learning where the users can be partitioned into clusters. We propose Iterative Federated Clustering Algorithm (IFCA), which alternatively estimates the cluster identities of the users and optimizes model parameters for the user clusters via gradient descent. We analyze the convergence rate of this algorithm first in a linear model with squared loss and then for generic strongly convex and smooth loss functions. We show that in both settings, with good initialization, IFCA converges at an exponential speed, and for linear models, we can obtain near optimal statistical error rate. In the experiments, we show that our algorithm can succeed even if we relax the requirements on initialization with random initialization with multiple restarts.