Leveraging Non-uniformity in First-order Non-convex Optimization

Classical global convergence results for first order methods rely on global smoothness and the Łojasiewicz inequality. Motivated by properties of objective functions arising in machine learning, we propose a non-uniform refinement of these notions, leading to Non-uniform Smoothness (NS) and Non-uniform Łojasiewicz inequality (NŁ). The new definitions inspire new geometry-aware first-order methods that are able to surpass the classical $\Omega(1/t^2)$ lower bounds. To illustrate the power of these geometry-aware methods and their corresponding non-uniform analysis, we consider two important problems in machine learning: policy gradient optimization in reinforcement learning (PG), and generalized linear model training in supervised learning (GLM). For policy gradient optimization, we find that normalizing the gradient ascent method can accelerate convergence to $O(e^{−t})$ while incurring less overhead than existing algorithms. For generalized linear model training, we show that normalized gradient descent can also achieve a linear convergence rate, which significantly improves the best known results. We additionally show that the proposed geometry aware descent methods escape landscape plateaus faster than standard gradient descent. Experimental results are used to illustrate and complement the theoretical findings.

Authors' notes