Zelda Mariet, Joshua Robinson, Jamie Smith, Suvrit Sra, Stefanie Jegelka
ICML 2020 Workshop on Real World Experiment Design and Active Learning
Publication year: 2020

Obtaining unbiased, low-variance estimates of the mean of a ground set of points by sampling a small subset of points is a crucial machine learning task, arising in stochastic learning procedures, experimental design, and active learning. In the purely stochastic case, it is well-known that importance sampling achieves unbiased estimates of minimal variance; but finding optimal distributions over batches of size greater than 1 has proven difficult. For batches of arbitrary size, we show a) that importance sampling achieves the lowest variance when sampling with replacement, and b) that parametric distributions over batches can be optimized via a quadratic form in the distribution’s first and second order marginals when sampling without replacement. We verify that such learned distributions outperform other batch selection methods, and achieve faster SGD convergence in downstream experiments.As a side-effect of our analysis, we show that distributions over fixed-sized subsets cannot be characterized by their first- and second- order marginals in polynomial time.