Kostis-S-Z.github.io

Projects | Blog | About

View on GitHub
Analysis Scale: 3

Motivation for Machine Learning

Final Goal is for machines to Learn which is a sign of Intelligence

A Machine Learning algorithm basically consists of:

Types of ML

Supervised Learning

A training set of examples that comes with the correct answers (targets) is provided and based on that the algorithm learns to generalise to respond correctly to all possible inputs. Ability to generalise and withstand noise

Regression (aka: function approximation, interpolation)

Trying to predict a value between values we know / fitting a mathematical function (a curve) that passes through all of our datapoints

Classification (its discrete)

Given an input vector decide in which of the N classes it belongs to based on the training examples

Unsupervised Learning

The goal here is for the algorithm to identify similarities between the inputs. Correct responses (targets) thus are not provided and the algorithm tries to find what things the inputs might have in common in order to categorise them together.

Reinforcement Learning

The algorithm gets told when the answer it provides is wrong but not how to correct it. Then the algorithm explores and tries different possibilities until the answer is correct.

A distinction between two types of models

Generative models

This type of classifiers focus on how the data were generated in order to classify them correctly. It finds the category the data mostly like belong to based on how they were generated. Given an observable variable X and a target variable Y, a generative model is a statistical model of the joint probability distribution on X × Y

Discriminative models

On the other hand, discriminative models do not care about how the data were generated, they simply predict given a signal. A discriminative model is a model of the conditional probability of the target Y, given an observation X

Definitions of words that you will hear often

Overfitting and Underfitting are both equally dangerous and harmful when training a model

A glimpse into Neural Networks

Based on neurons that make synapses and are able to change / adapt.

The input nodes show the number of features fed to the neurons. The lines are the synapses where each line represents a weight from the specific input value to the specific neuron. If you have M number of features and N neurons then you have MxN number of weights. All neurons are independent of each other. Even weights of a neuron are independent of each other.

This is determined by the learning method which ultimately is how you calculate the error.

Learning Method:

The update of the weights: w_ij = w_ij - Δw_ij

Perceptron: Find the weights that fit exactly the correct solution (y=t)

Δw = - η * (y_j - t_j) * x_i where y: the predictions of the model and t: the targets (ground truth)

Delta Rule: Try to minimise the error based on its gradient (this is a hint about the famous gradient descent used in multilayer networks!)

Use the output of the neuron before the activation function

e = t - wx

Set the error function as following

ε = e^2 / 2

Take the derivative of the error function

δε / δw = e δε / δw = e δ(t-wx) / δw = -e * x

Δw = - η * e * x

Some notes regarding the gradient:

It is always possible to separate two classes with a linear function provided you project the data in the correct number of dimensions. (for this checkout SVM)

It is common to turn classification problems into regression problems. Two ways to do it:

Nice Quotes

The question of which features to choose is not always an easy one.

Computers don’t care about number of dimensions.

Under-training and over-training are equally dangerous.

Sources

  1. Marsland, Stephen. Machine learning: an algorithmic perspective. Chapman and Hall/CRC, 2011.

  2. Lecture 1-2 Fundamentals Slides from Artificial Neural Networks and Deep Architectures DD2437 @ KTH Spring 2019