Friday, July 24, 2015

What is Machine Learning? - Part II

Linear Regression with Multiple Variables

Multiple Features

Linear regression with multiple variables is also known as "multivariable linear regression." We now introduce notation for equations where we can have any number of input variables.
$$ \begin{align*} x_j^{(i)} &= \text{value of feature } j \text{ in the }i^{th}\text{ training example} \newline x^{(i)}& = \text{the column vector of all the feature inputs of the }i^{th}\text{ training example} \newline m &= \text{the number of training examples} \newline n &= \left| x^{(i)} \right| \; \text{(the number of features)} \end{align*} $$
Now define the multivariable form of the hypothesis function as follows, accomodating these multiple features:
$$ h_\theta (x) = \theta_0 + \theta_1 x_1 + \theta_2 x_2 + \theta_3 x_3 + \cdots + \theta_n x_n $$
Using the definition of matrix multiplication, our multivariable hypothesis function can be concisely represented as:
$$ \begin{align*} h_\theta(x) = \begin{bmatrix} \theta_0 \hspace{2em} \theta_1 \hspace{2em} ... \hspace{2em} \theta_n \end{bmatrix} \begin{bmatrix} x_0 \newline x_1 \newline \vdots \newline x_n \end{bmatrix} = \theta^T x \end{align*} $$
This is a vectorization of our hypothesis function for one training example; see the lessons on vectorization to learn more. [Note: So that we can do matrix operations with theta and x, we will set $x^{(i)}_0 = 1$, for all values of $i$. This makes the two vectors 'theta' and $x^{(i)}$ match each other element-wise (that is, have the same number of elements: $n + 1$).]

Now we can collect all $m$ training examples each with $n$ features and record them in an $n+1$ by $m$ matrix. In this matrix we let the value of the subscript (feature) also represent the row number (except the initial row is the "zeroth" row), and the value of the superscript (the training example) also represent the column number, as shown here:

Monday, July 20, 2015

What is Machine Learning?

Two definitions of Machine Learning are offered. Arthur Samuel described it as: "the field of study that gives computers the ability to learn without being explicitly programmed." This is an older, informal definition.

Tom Mitchell provides a more modern definition: "A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E."

Example: playing checkers.

E = the experience of playing many games of checkers
T = the task of playing checkers.
P = the probability that the program will win the next game.

Supervised Learning

In supervised learning, we are given a data set and already know what our correct output should look like, having the idea that there is a relationship between the input and the output.

Supervised learning problems are categorized into "regression" and "classification" problems. In a regression problem, we are trying to predict results within a continuous output, meaning that we are trying to map input variables to some continuous function. In a classification problem, we are instead trying to predict results in a discrete output. In other words, we are trying to map input variables into discrete categories.

Example:

Given data about the size of houses on the real estate market, try to predict their price. Price as a function of size is a continuous output, so this is a regression problem.

We could turn this example into a classification problem by instead making our output about whether the house "sells for more or less than the asking price." Here we are classifying the houses based on price into two discrete categories.