Saturday, December 27, 2008

Teamwork and Machine Learning

In many engineering programs there is a focus on how to work in teams and how to divide projects into parts. In embedded systems it may start with the division between hardware and software. Then it may be further divided by different subsystems and software libraries. I haven't seen much emphasis on the different roles in machine learning. I see the different categories as being

  1. Acquiring the data and getting in a database.
  2. Extracting the data from the database into .csv and .mat files and into the form that can be sent directly into an algorithm.
  3. Designing new models, coding up the inference methods, and testing the algorithms on synthetic data.
  4. Creating a test bed to divide the data into training and test, evaluate different methods, and report results.
  5. Determining what feature matrices and models to use and putting everything together.
  6. Implementing libraries that can be used in actual applications
  7. Testing the real world libraries

From what I've seen not enough emphasis is placed on the division of the tasks academically or industrially. I think it is most effiecient to divide these tasks among different people who can be specialized. It is somewhat wasteful to take a person who is an expert in designing inference algorithms and have them spend most of their time setting up a database.

No comments: