Pareto k diagnostic, difference between pre-asymptotic and asymptotic behavior, and high dimensional examples are from - Pareto Smoothed Importance Sampling https://arxiv.org/abs/1507.02646
Some ideas on why stochastic optimization for other divergences is more difficult comes from