Traditional automated theorem provers for first-order logic depend on speed-optimized search and many handcrafted heuristics that are designed to work best over a wide range of domains. Machine learning approaches in the literature either depend on these traditional provers to bootstrap themselves with heuristics or struggle due to limited proof data. More importantly, gaps in the difficulties of the theorems within a dataset hinder progress, because solved theorems are often quite easy, and therefore do not provide sufficient information to help in learning to solve hard theorems. In this paper, we propose to leverage recent ideas from reinforcement learning to the domain of automated theorem proving by making use of intermediate data generated during the proof process, based on a modification of hindsight experience replay. We built a first order logic prover by disabling all the smart heuristics of state-of-the-art E prover and replacing them with a clause-scoring function learned by the proposed approach. Clauses are represented as graphs and presented to transformer networks with spectral features. We show that provers trained in this way can match and sometimes surpass state-of-the-art traditional provers, as well as machine learning provers, on the popular benchmarks MPTP-2078 and M2k, by both proving more theorems and generating more concise proofs.