Discussion Lead: David Wolpert [CCM]
Topic: Want to use cross-validation? Use stacking instead
Abstract: Cross-validation is perhaps the most commonly used technique in
machine learning and statistics. Indeed, it can be viewed as a formalization of
the scientific method. Cross-validation is, at its core, a winner-take-all
meta-supervised learning algorithm, run over a meta-data set whose input space is the set of
predictions by all candidate algorithms on held-out points, and whose output is the
associated truths in those held-out points. Stacking is the simple idea to replace
cross-validation’s winner-take-all algorithm with a more sophisticate learning algorithm.
In this talk I review some of the experimental demonstrations of stacking’s power,
in domains ranging from supervised learning to unsupervised learning to Monte Carlo
integral estimation to community detection in networks.