Elastic net, LASSO, and LARS in Python

5 years ago

skip to main |
skip to sidebar
## Monday, December 27, 2010

###
NIPS 2010 highlights

## Blog Archive

## About Me

## Blogroll

It was the last year in Vancouver/Whistler. So, luckily the snow conditions were good ;)

On the technical side:

Switched Latent Force Models for Movement Segmentation

Mauricio Alvarez, Jan Peters, Bernhard Schoelkopf, Neil Lawrence

They modeled an input/output system governed by a linear differential equation where the input was distributed according a switching GP. They took advantage of the fact that the derivative of a function from a GP is also GP distributed, as well as linearity properties. Therefore, the output of the system was also distributed according to a switching GP model. They used the model to segment human motion. I liked it since it was closely related to my ICML paper on GP change point models. They claimed the advantage of their method is that it enforced continuity in the time series across segment switches. Although, this can easily be done in my setup I am glad my paper got a citation ;)

Global seismic monitoring as probabilistic inference

Nimar Arora, Stuart Russell, Paul Kidwell, Erik Sudderth

They used graphical models to infer if earthquakes and other seismic events (e.g. nuclear tests) are noise (from local events near a seismic sensor) or from a genuine event, which should be noticed by multiple seismic sensors.

A Bayesian Approach to Concept Drift

Stephen Bach, Mark Maloof

This paper is also similar to the Adams & MacKay change point framework. They replaced the base model (UPM) with a discriminative classifier (such as Bayesian logistic regression). They admitted to fitting some of the hyper-parameters to the test, which is cheating. However, they tried to justify it by saying that it is inappropriate to try to learn the frequency of concept drifts (change points) from training data. I don't think the argument is coherent.

Predicting Execution Time of Computer Programs Using Sparse Polynomial Regression

Ling Huang, Jinzhu Jia, Bin Yu, Byung-Gon Chun, Petros Maniatis, Mayur Naik

They did an analysis of programs to predict their execution time. The novelty of the paper is that they created features by "splicing" the program; they found small snippets of the program that could be executed quickly. They used the output of these snippets as features for a LASSO regression with polynomial regression. Polynomial basis functions are sensible since the run-time of a program is usually approximately linear, quadratic, or cubic in some aspect of its input. I pointed them to Zoubin's polybayes.m demo as a way of selecting the order of a polynomial from data. Symbolic regression using Eureka might also be illuminating.

Slice sampling covariance hyperparameters of latent Gaussian models

Iain Murray, Ryan Adams

Iain presented a some tricks for transforming the sample space in GP classification to drastically improve the convergence of sampling GP hyper-parameters. Iain is a fan of re-parameterizing models to spaces that makes sampling easier. He claims the naive sampling method gets stuck in an "entropic barrier." He says this a third and often ignored, but common, failure mode of MC methods. The are other two are: the sampling method getting stuck in one mode of the posterior and dimensions that are highly correlated.

Heavy-Tailed Process Priors for Selective Shrinkage

Fabian Wauthier, Michael Jordan

Fabian did GP classification while applying heavy tail noise to the latent GP before squashing the function through a sigmoid/probit. They claim GPC often gives over confident predictions in sparsely sampled areas of the input space. This method claims to alleviate the problem. Since the problem does not occur in synthetic data I asked him what he thought was the underlying model assumption violated. He believes the root cause is the stationarity assumption in most GP kernels is inappropriate in many cases.

Copula Processes

Andrew Wilson, Zoubin Ghahramani

It was nice to see that Andrew attracted quite a crowd at his poster.

At the workshops I liked:

Natively probabilistic computation: principles and applications

Vikash Mansinghka, Navia Systems

Vikash argued that his accelerated hardware could do millions of samples per second in Gibbs sampling an MRF (1000x improvement). The hardware restricted the flexibility of what kind of sampling you could do. The loss in performance from lossing that flexibility was compensated for many times over by using the hardware acceleration. He argues that maybe the best approach is to use simple samplers and his accelerated hardware over sophisticated samplers in software.

There was talk about the prospect of moving to analog computation for sampling. A lot of energy is used in CPUs to make them completely deterministic with digital computation, but then in MC methods we artificially introduce randomness. Maybe it is better to do MC computations with analog. However, Vikash said that we must limit the analog computation to very small accelerated units within a digital processor in order for it to be manageable. The analog element would require custom ICs, which requires more funding than he currently has. However, he has selectively reduced the bit precision of many of his computations, which he says can be done when the quantities are random. This saves chip real-estate and power.

On the technical side:

Switched Latent Force Models for Movement Segmentation

Mauricio Alvarez, Jan Peters, Bernhard Schoelkopf, Neil Lawrence

They modeled an input/output system governed by a linear differential equation where the input was distributed according a switching GP. They took advantage of the fact that the derivative of a function from a GP is also GP distributed, as well as linearity properties. Therefore, the output of the system was also distributed according to a switching GP model. They used the model to segment human motion. I liked it since it was closely related to my ICML paper on GP change point models. They claimed the advantage of their method is that it enforced continuity in the time series across segment switches. Although, this can easily be done in my setup I am glad my paper got a citation ;)

Global seismic monitoring as probabilistic inference

Nimar Arora, Stuart Russell, Paul Kidwell, Erik Sudderth

They used graphical models to infer if earthquakes and other seismic events (e.g. nuclear tests) are noise (from local events near a seismic sensor) or from a genuine event, which should be noticed by multiple seismic sensors.

A Bayesian Approach to Concept Drift

Stephen Bach, Mark Maloof

This paper is also similar to the Adams & MacKay change point framework. They replaced the base model (UPM) with a discriminative classifier (such as Bayesian logistic regression). They admitted to fitting some of the hyper-parameters to the test, which is cheating. However, they tried to justify it by saying that it is inappropriate to try to learn the frequency of concept drifts (change points) from training data. I don't think the argument is coherent.

Predicting Execution Time of Computer Programs Using Sparse Polynomial Regression

Ling Huang, Jinzhu Jia, Bin Yu, Byung-Gon Chun, Petros Maniatis, Mayur Naik

They did an analysis of programs to predict their execution time. The novelty of the paper is that they created features by "splicing" the program; they found small snippets of the program that could be executed quickly. They used the output of these snippets as features for a LASSO regression with polynomial regression. Polynomial basis functions are sensible since the run-time of a program is usually approximately linear, quadratic, or cubic in some aspect of its input. I pointed them to Zoubin's polybayes.m demo as a way of selecting the order of a polynomial from data. Symbolic regression using Eureka might also be illuminating.

Slice sampling covariance hyperparameters of latent Gaussian models

Iain Murray, Ryan Adams

Iain presented a some tricks for transforming the sample space in GP classification to drastically improve the convergence of sampling GP hyper-parameters. Iain is a fan of re-parameterizing models to spaces that makes sampling easier. He claims the naive sampling method gets stuck in an "entropic barrier." He says this a third and often ignored, but common, failure mode of MC methods. The are other two are: the sampling method getting stuck in one mode of the posterior and dimensions that are highly correlated.

Heavy-Tailed Process Priors for Selective Shrinkage

Fabian Wauthier, Michael Jordan

Fabian did GP classification while applying heavy tail noise to the latent GP before squashing the function through a sigmoid/probit. They claim GPC often gives over confident predictions in sparsely sampled areas of the input space. This method claims to alleviate the problem. Since the problem does not occur in synthetic data I asked him what he thought was the underlying model assumption violated. He believes the root cause is the stationarity assumption in most GP kernels is inappropriate in many cases.

Copula Processes

Andrew Wilson, Zoubin Ghahramani

It was nice to see that Andrew attracted quite a crowd at his poster.

At the workshops I liked:

Natively probabilistic computation: principles and applications

Vikash Mansinghka, Navia Systems

Vikash argued that his accelerated hardware could do millions of samples per second in Gibbs sampling an MRF (1000x improvement). The hardware restricted the flexibility of what kind of sampling you could do. The loss in performance from lossing that flexibility was compensated for many times over by using the hardware acceleration. He argues that maybe the best approach is to use simple samplers and his accelerated hardware over sophisticated samplers in software.

There was talk about the prospect of moving to analog computation for sampling. A lot of energy is used in CPUs to make them completely deterministic with digital computation, but then in MC methods we artificially introduce randomness. Maybe it is better to do MC computations with analog. However, Vikash said that we must limit the analog computation to very small accelerated units within a digital processor in order for it to be manageable. The analog element would require custom ICs, which requires more funding than he currently has. However, he has selectively reduced the bit precision of many of his computations, which he says can be done when the quantities are random. This saves chip real-estate and power.

Subscribe to:
Posts (Atom)