Sunday, July 4, 2010

NIPS 2009 highlights

Reading Tea Leaves: How Humans Interpret Topic Models, Jonathan Chang, Jordan Boyd-Graber, Sean Gerrish, Chong Wang and David Blei, Princeton University Really interesting idea for evaluating an unsupervised method. They use human subjects on Amazon Turk for controlled experiments to determine the interpretability of topic model outputs. They find that higher perplexity doesn't always correspond to higher interpretability by humans. The model with better perplexity may have found more structure in the data, but that may or may not correspond to our intuitive notions of a "topic". After talking to them there are some big open questions: 1) Do correlated topic models have better perplexity than LDA, but lower interpretability, because people don't think about words being assigned to multiple topics OR because there are unknown models, which better explain the data, and are better on both measures? 2) Is it possible to use the feedback from human subjects to do semi-supervised or active learning to improve topic models? I really liked this paper, because many machine learning researchers would find a contrived automatic measure for interpretability, which may not reflect interpretability from human subjects, which is what you really care about. This is especially true in the empirical risk minimization (ERM) community where a contrived, but automatic, measure is preferred because it is easier to optimize. Semi-supervised learning using feedback from humans in complex experiments makes it much harder to operate in terms if risk.

Making Very Large-Scale Linear Algebraic Computations Possible Via Randomization (Tutorial) Gunnar Martinsson Encouraging results on doing large scale matrix computations.

Sequential Monte-Carlo Methods (Tutorial) Arnaud Doucet and Nando de Freitas Really clear description of bootstrap particle filters.

Gaussian process regression with Student-t likelihood, Jarno Vanhatalo, Pasi Jylanki and Aki Vehtar they claim their Laplace approximation is better than EP and MCMC.

On Stochastic and Worst-case Models for Investing, Elad Hazan, IBM, and Satyen Kale, Yahoo! Research Tried to create bounds on worst-case scenario's for finding a portfolio. Interesting idea, but after talking to the authors, it seems they made the implicit assumption that the market can't drop more than 50% in a day, which they didn't mention.

Fast subtree kernels on graphs, Nino Shervashidze and Karsten Borgwardt, MPIs Tuebingen They use a kernel that can be used for regression/classification when the inputs are graphs (such as a molecular structure). They used SVMs in the paper, but it could be easily used in the GP context (which of course excites me more).

Invited Talk: Bayesian Analysis of Markov Chains, Persi Diaconis, Stanford I was excited to finally see the much talked of Diaconis in the flesh.

Machine Learning for Sustainability, J. Zico Kolter, Stanford University, Thomas Dietterich, Oregon State University, and Andrew Ng, Stanford University (mini-symposium) They really put their money where their mouth is. One of the speakers did his presentation over Skype to avoid the CO2 necessary to fly from San Francisco to Vancouver.

Improving Existing Fault Recovery Policies, Guy Shani and
Christopher Meek, Microsoft Research
Interesting since its similar to what I do ;)

No comments: