Hacker Notes: Machine Learning

Hacker Notes: Machine Learning

2011, Dec 18    

I just finished the Machine Learning online course provided by Stanford University! This is the official class page at Stanford at the time of this writing. I took the course via enrollment at .

In the class, I studied Machine Learning. As the definition goes, a machine can be said to learn if it improves on a performance measure P for a given task T by an experience of E.

Machine learning works like this: in “supervised machine learning”, you “train” a learning algorithm with labeled examples. The trained algorithm (i.e. one that has “learned” a certain set of parameters) can then be used to predict or process new data.

An example of a supervised learning algorithm is recognizing hand writing. This is used extensively by the postal service to help automate the flow of mail. The implementation we worked on in the class was a neural network (beyond scope here) which recognized hand written digits. By providing the neural network with labeled input (an image of a handwritten number 9) and the correct output (the digit 9), it learned to identify new writing which it had not yet seen.

What’s going on behind the scenes is that the labeled examples modify parameters on a statistical model. The way machine learning algorithms are written, they can be used to create really complex (and powerful) models. Then, new input is run through the parametrized statistical model (the “trained” neural network) and out pops, for each digit 0-9, the probability that the image is that digit.

Supervised learning includes linear regression, logistic regression, neural networks, and others. The other type is unsupervised learning. In this case, the algorithm just looks at unlabeled data, and attempts to find groups or patterns in the data. This might include market segmentation, anomaly detection, or other grouping operations.

An example there is anomaly detection in a production line. An airplane engine might be tested for its heat vs. its vibration as part of production. An anomaly detection system, given plenty of examples of normal engines, could read the test results on new engines and raise a red flag if a certain combination of heat and vibration come up that are statistically very improbable.

At a really high level, that’s about the whole of the class. Most of the time is spent constructing and understanding the implementation of these learning algorithms. At the end, you get a look at using individual learning algorithms as building blocks to create pipelines.

An example pipeline covered is a program to recognize text in photos. This is a pipeline of three algorithms (all supervised, i.e. “trained” on labeled examples). The first detects what parts of the photo are text, the second breaks up the text into letters, and the last identifies those letters.

It’s interesting to think about learning algorithms as building blocks. Since they’re just really complex statistical models, you can see they’re only giving rise to even more complex models when combined. And yet you can start breaking up “actions” like identifying and reading text present in photos by recognizing areas of text, breaking it up into letters, and reading those letters.

It’s interesting to think how algorithms and entire pipelines could be lined up to create complex processing on input. I imagine this is how researchers accomplish things like machine vision, or in attempts to model the human brain.

(For reference, I’ve added the exercises (and my solutions) from the class to my personal GitHub account here. If you’re planning on taking the class, please ignore.)