Kostis-S-Z.github.io

Projects | Blog | About

View on GitHub
Analysis Scale: 2

A few notes on unsupervised learning

The goal is to find similarities between inputs. We cant do regression since we dont know the target but maybe in some cases we can do classification through clustering.

Motivation for k-means

The algorithm:

Given a dataset we want to divide it into k categories / clusters. The goal is to adjust the value of each k centre so it represents best each cluster. The problem is we dont know where the clusters are (there are no labels) and we certainly do not know where their middle is. This is what we want to find out. Compute the mean point (μ) of each cluster and put the centre there.

Minimising the euclidean distance isnt that different than trying to minimise the sum-of-squares error then.

We randomly place in the space the cluster centres.

Until convergence:
    For each datapoint:
        Compute the distance to each cluster centre
        Assign the datapoint to the closest cluster centre
    
    For each cluster centre:
        Calculate the mean of the points that have been assigned to that cluster
        Move the centre to that point

The testing is utilizing only phase 1

Some notes on k-means

Competitive learning

We can then train a neural network with k-means! Have only one layer of competitive neurons with no bias node that will try to compete on which one will be activated. Only one of the neurons will finally activate which will be the one with the highest activation. This method is called winner-takes-all activation.

On-line k-means Neural Net

First, let’s think about something. We cannot use the actual k-means algorithm to update the weights since we often feed the network with just one input (on-line) instead of the whole dataset and thus we cannot compute the mean.

An abstract approach:

Vector quantisation

When we compress data we need to accept that we might have to lose a bit of accuracy. When we have a unique vector that we want to send but want to compress it, we will send the a close representation of it, meaning not all of the datapoints will be able to be reconstructed (some information will be lost). This is called vector quantisation and is what happens with lossy compression.

In order to minimise then the loss of compression we need to find the best prototype vector which is as close as possible to all of our possible inputs.

Self Organising Feature Maps (SOM)

The self organising feature map algorithm is the most popular competitive learning algorithm. It demonstrates relative ordering preservations, also known as topology preservation. But this isn’t always possible preserve this topology since for a higher dimensionality than 2D, the order cannot be preserved.

As we said, we want that when one neuron fires, its neighboring neurons to be affected. A winning neuron then needs to pull other neurons that are close it, closer to itself in the weight space (positive connections). But also, to repel neurons that are further away (negative connections) and just ignore neurons that are already further away since they represent different features. This leads to the Mexican Hat form of lateral connections.

The algorithm:

Phase 1: Initialisation

OR

Phase 2: Learning

For each datapoint

  1. Select the neuron which is closest (minimum euclidean distance between the weights and the input)

  2. Update the weight vector of the best-matching neuron

  3. Update the weight vector of all other neurons

  4. Reduce the learning rate and adjust the neighbourhood function

The testing is utilizing only phase 1

3 steps of initialization:

General notes

Too small -> underfitting / too simple / general. Too big -> overfitting

Sources:

  1. Lectures 5-6 “Competitive learning, self-organising maps” Slides from Artificial Neural Networks and Deep Architectures DD2437 @ KTH Spring 2019

  2. Marsland, Stephen. Machine learning: an algorithmic perspective. Chapman and Hall/CRC, 2011.