Practical Real Time Recurrent Learning for Sparse RNNs

Current methods for training neural networks on sequences are based on Back-Propagation Through Time (BPTT), which requires memorization of the network state at each timestep. Sparse Recurrent Neural Networks (RNNs) show significant parameter and FLOP efficiency gains over dense RNNs, but training with the large state sizes necessary for maximum performance runs into memory limitations due to the storage requirements of BPTT. Real Time Recurrent Learning (RTRL) is an alternative to BPTT that eliminates memorization at the expense of fixed but massive per-step costs of maintaining an influence matrix that is cubic in the state size and performing forward-mode updates requiring a quartic amount of computation. In this work, we observe that sparsity in the parameters and dynamics of an RNN significantly alleviates the cost of RTRL. We propose a natural sparse approximation to the influence matrix which makes RTRL tractable even for large state sizes. This algorithm outperforms other approximations to RTRL with comparable costs such as Unbiased Online Recurrent Optimization (UORO) and closely matches the performance of the unapproximated gradient.

Authors' notes