Open In Colab

What Is Machine Learning?

Before we take a look at the details of various machine learning methods, let's start by looking at what machine learning is.

Fundamentally, machine learning involves building mathematical models to help understand data. "Learning" enters the fray when we give these models tunable parameters that can be adapted to observed data; in this way the program can be considered to be "learning" from the data. Once these models have been fit to previously seen data, they can be used to predict and understand aspects of newly observed data.

Categories of Machine Learning

At the most fundamental level, machine learning can be categorized into two main types: supervised learning and unsupervised learning.

Supervised learning involves somehow modeling the relationship between measured features of data and some label associated with the data; once this model is determined, it can be used to apply labels to new, unknown data. This is further subdivided into classification tasks and regression tasks: in classification, the labels are discrete categories, while in regression, the labels are continuous quantities. .

Unsupervised learning involves modeling the features of a dataset without reference to any label, and is often described as "letting the dataset speak for itself." These models include tasks such as clustering and dimensionality reduction. Clustering algorithms identify distinct groups of data, while dimensionality reduction algorithms search for more succinct representations of the data.

In addition, there are so-called semi-supervised learning methods, which falls somewhere between supervised learning and unsupervised learning. Semi-supervised learning methods are often useful when only incomplete labels are available.

To summarize:

  • Supervised learning: Models that can predict labels based on labeled training data

    • Classification: Models that predict labels as two or more discrete categories (e.g. whether a picture shows a dog or a cat)
    • Regression: Models that predict continuous labels
  • Unsupervised learning: Models that identify structure in unlabeled data

    • Clustering: Models that detect and identify distinct groups in the data
    • Dimensionality reduction: Models that detect and identify lower-dimensional structure in higher-dimensional data

While the theoretical aspects of these methodologies are sumarized in the physical booklet, the following chapters will show how we can practically use machine learning tools.