Kostis-S-Z.github.io

Projects | Blog | About

View on GitHub
Analysis Scale: 3

Motivation for RBF

In the Multi-Layer Perceptron the activity in the hidden layer is distributed over multiple neurons. Instead we can follow another approach where each neuron is responsible to a particular part of the input space (local neurons). It is based on the idea that if the inputs are similar, then the responses of the local neurons to those inputs will also be similar. Inputs that are close together should generate the same output.

Inspiration from Receptive Fields: Imagine that the weight space and the input space are in the same dimensions and the locations of the nodes can be adjusted by changing the values of the weights. In that space, it doesn’t matter where a point is with regard to a node, it just matters how far away from it is. The important thing about these nodes, is the radius of their circle (radial functions). This means they only depend on the two-norm / x_i - x / _2 which is the Euclidean distance between the point and the centre of the circle. Another thing we need to define is the drop-off when moving away from the center. In order to make it decrease smoothly we can use a differentiable mathematical function that decreases symmetrically (radially). The most common one is the Gaussian, which is defined by a μ and a σ. Selecting a good width (σ) of a node is crucial. Make it to big and it activates on every input, make it to small and it will overfit to only one stimulus. That is why there is a specific function, called indicator or delta function, which does precisely that for us.

Radial Basis Function (RBF) Network: A set of nodes (a different name for neuron in this context) that represent radial bases in the weight space. These nodes are also called centres since they have their own circle around them (Gaussian). A RBF network consists of:

Note: The RBF layer in the network doesn’t have a bias term but the output layer does! This is in case none of the RBF nodes fire.

The purpose of the RBF nodes is to find a non-linear representation of the inputs. The purpose of the output nodes is to find a linear combination of the hidden nodes in order to do classification.

Training a RBF Network (A Hybrid Algorithm!):

Part 1: Position RBF nodes

For a RBF Network we need to make sure that the training data provide a good representation of the whole dataset (and are not randomly picked). So we need to initialize the position of the nodes so that they are representative of a typical input. This can be done efficiently using unsupervised learning methods. The most common one for RBFs is the k-means algorithm. But we can also use Kohonen Feature Maps (SOM) or Vector Quantization (VQ) or Gaussian Mixture Models (using EM)

Part 2: Train the output layer

Use the activation of those nodes to train the output. There are two ways to do that.

  1. A simple but slow way is to train the same way as in a Perceptron. This requires iterations to fine tune the weights of the output nodes.

OR

  1. A bit more complex but instant method (meaning no iterations at all!) is using the pseudo-inverse. When we want to train the RBF layer we can define a matrix G as the the activation of all the hidden nodes for each input (e.g G_ij activation of RBF node j for input i). Then the output of the network will be y = GW, where W is the weight matrix. The goal is that y = t (meaning our outputs are the same as our targets). To find the values of the weight matrix then, we use this equation t = GW => W = G^(-1) t with the only unknown parameter to be the weights. So we would need to calculate the matrix inverse of G. But since the number of nodes is not the same as the number on inputs (we said we don’t want that due to a high risk of overfitting) the matrix is not square. To bypass this we can use the pseudo-inverse G+ version of this matrix.

Selecting a right width (σ) for the nodes.

This is a crucial part when working with RBF networks and there a few ways we can go about it.

  1. Select the same width for all the nodes and test different values on a validation set and then pick the best one

  2. If we have many nodes that will cover the whole space then the width can be set as the maximum distance between the location of the hidden nodes and their number, meaning σ = d / sqr(2M), during training with k-means

  3. In order to deal with the problem of outliers (having an input that is so far away that none of the nodes are being activated) we can use normalised Gaussians which basically means that the closest node will fire even if the input is very far away.

The term hybrid indicates that it combines supervised and unsupervised learning.

RBF and Regression: Using a RBF Network to interpolate / infer values of a function.

Instead of having multiple nodes contributing to the output for a specific input we can simplify the problem. We can space out the nodes so that they do not overlap and are just placed next to each other (in order to have a continuous function the nodes must have no empty space between them). Then for each input we will only consider one RBF node and will instead return the average value within its patch. These RBF functions then are called splines and can be much more complex than just linear. The most common is the cubic spline (x^3).

General notes for RBF:

RBF vs MLP:

Sources:

  1. Lecture 5 “Radial basis function networks and introduction to competitive learning” Slides from Artificial Neural Networks and Deep Architectures DD2437 @ KTH Spring 2019

  2. Marsland, Stephen. Machine learning: an algorithmic perspective. Chapman and Hall/CRC, 2011.

  3. A nice tutorial on RBFs: http://mccormickml.com/2013/08/15/radial-basis-function-network-rbfn-tutorial/