Primer: how machines learn representations in data

When there’s too much information, reducing the amount of information – simplifying the data – can reveal more. We can use unsupervised learning techniques to do this. This is done by designing a neural network that imposes a bottleneck in the network which forces a compressed knowledge representation of the original input. This is called an autoencoder and this idea is foundational to understanding how new architectures are being used in natural language processing.

If the input features were each independent of one another, this compression and subsequent reconstruction would be a very difficult task. But, if some sort of structure exists in the data (correlations between input features), this can be learned and then leveraged when forcing the input through the network’s bottleneck.

It’s easier to understand this concept with an example.

Let’s say we are trying to predict whether someone has the flu from a checklist of three symptoms; cough, high temperature, aching joints. We assign the value “1” to “yes” and consider them sick when the person has at least two of these.

But we also want to know other things about the patient that might mean they are less likely to be sick, for example, that they had a flu shot or that they have taken Vitamin D pills or that they recently started doing extreme crossfit (which would explain the aches and pains). There we label the counter symptoms in the same way – “1” for “yes,” “0” for “no.”

We consider a patient to be healthy when they have at least two of these.

Now let’s put this idea into a neural network. There is one input layer and one output layer, each with the 6 data features; cough, high temperature, aching joints, flu shot, Vitamin D, extreme crossfit. If we only have two neurons in the hidden layer, the hidden layer can only feed forward “sick” or “healthy.” Over successive iterations, the two hidden units will be forced to have different sensitivities (activations) to the inputs.

The end result is that we have created a network that has learned a compressed representation of the data while making accurate predictions about whether someone is sick or healthy based on their symptoms and history.

In machine learning there are many tricks that are used to get a computer to discover a pattern or a relationship that a human can’t see. Many techniques aim to break down something complex into the right little pieces to find the meaning. But there are other tricks, where hidden rules are found by forcing the computer to compress information, break it up and discard parts of it, then forcing the computer to reconstruct the original form. 

When we do this inside a neural network, the resulting structure contains autoencoders – an unsupervised neural network that works backward – deconstructing an output and then reconstructing the right inputs. By forcing the machine to deconstruct information and then reconstruct it, the network finds correlations in the simplified structure.

The advantage of doing this isn’t just about processing power or memory, it’s fundamental to learning: the network figures out how to compress things on its own, making a clean image out of a noisy and distorted one.

Share on email
Share on facebook
Share on linkedin
Share on twitter