A plethora of problems in AI, engineering and the sciences is naturally formalized as probabilistic inference. Exact solutions to these are, however, often prohibitively expensive to compute, necessitating approximations. Approximate inference methods avoid high computational cost by evaluating the (un-normalized) target density only on a small part of its domain. Therefore, they necessarily operate under partial observations, effectively resulting in epistemic uncertainty about the target distribution. Following this argumentation, we cast approximate inference in discrete distributions as sequential decision making under uncertainty allowing us to leverage methods from reinforcement learning. In particular, we propose the TreeSample algorithm, an adaptation of Monte Carlo Tree Search for approximate inference in discrete distributions. This algorithm caches all previous queries to the target density in an explicit search tree, and dynamically allocates new queries based on a "best-first" heuristic for exploration, using previously developed upper confidence bound methods. We show that this non-parametric inference method can be combined with neural networks that compile approximate conditionals of the target, which are then used to guide the inference search and enable generalization of inference computations across multiple target distributions. We show empirically, that TreeSample outperforms a standard approximate inference methods on synthetic factor graphs.