Machines learn from experience

Machine-learning tools for big-data analytics include programs that can be thought of as able to learn from experience. Such programs can be used to process natural language, make recommendations, or uncover how biological systems work, according to MIT Professor Tommi Jaakkola, delivering a lecture in the online course “Tackling the Challenges of Big Data.” They can also be used for predictive user modeling or for solving large-scale inverse problems.

Machine learning is useful, Jaakkola said, because modern engineering problems are often hard to specify and solve directly. For instance, it may be hard to write an algorithm to detect credit-card fraud, but a machine can be presented with examples of legitimate and fraudulent credit-card transactions and learn to identify the latter.

Mapping from an example to a label (for example, “fraudulent”) is a classification problem, Jaakkola said. Classifying news articles or biomedical samples, mapping genotype signatures to phenotypes, and predicting financial strategies’ success are all simple classification problems.

Beyond simple classification lie problems such as transcribing speech or processing natural-language sentences—often in the presence of incomplete or erroneous data. Jaakkola presented an example of mapping a sentence to dependency parses, noting that parsing a single sentence is computationally hard. But, he said, adaptively decomposing sentences into loosely coupled pieces that are easy to solve individually results in a close approximation for most languages.

He then discussed “recommend” problems—for example, predicting what movie I might like. The concept is simple: If you like movies A, B, and C, and I like movies A and C but have not seen B, it’s probably safe to recommend that I would like movie B.

The problem arises when you have thousands of movies and millions of users. The problem can be represented in a matrix of users vs. movies, with users’ ratings for movies populating the matrix. It’s a sparsely populated matrix, because the average user will have seen very few of the total available movies, and the problem essentially becomes a matrix-completion problem involving finding the simplest matrix consistent with limited data. The problem can be addressed via factorization—based on “user features” (the ratings a user assigned to a limited number of movies) and “item features” (the users who rated the movie).

Jaakkola concluded by saying, “There are lots of machine-learning algorithms available out there that solve all kinds of problems where we need to learn from experience. And I hope and strongly encourage you to learn more about them and try to apply them to problems that you are interested in.”

See these related posts on the online course “Tackling the Challenges of Big Data”: