Skip Context Tree Switching

Context Tree Weighting (CTW) is a powerful probabilistic sequence prediction technique that efficiency performs Bayesian model averaging over the class of all prediction suffix trees of bounded depth. In this paper we show how to generalize this technique to the class of k-skip prediction suffix trees. Contrary to regular prediction suffix trees, k-skip prediction suffix trees are permitted to ignore up to k contiguous portions of the context. This allows for significant improvements in predictive accuracy when irrelevant variables are present, a case which often occurs within record-aligned data and images. We provide a regret-based analysis of our approach, and empirically evaluate it on the Calgary corpus and a set of Atari 2600 screen prediction tasks.