Hey there!
This week, I want to dive into something that might sound a bit intimidating at first: neural networks.
I know, I know. Just the phrase "neural networks" can bring to mind complex equations and head-scratching calculus. But trust me, it doesn't have to be that way! I want to share how I came to understand these fascinating systems, and hopefully, make it click for you too.
You see, I did my high school in England back in the 80s. And guess what wasn't on the curriculum? Calculus. While I might have been happy about it back then, it's definitely presented some interesting challenges when trying to get a grip on machine learning concepts today.
So, I had to find a way to understand how neural networks work under the hood without getting bogged down in derivatives and the chain rule. And that's exactly what I want to share with you now.
So, How Does a Neural Network Actually Work?
Think of a neural network like a team of interconnected nodes, or "neurons," organized in layers. At a minimum, you'll usually see three types of layers:
- Input Layer: This is where your raw data comes in. It's the starting point of the journey for your information.
- Hidden Layer(s): These are the layers in between the input and output. They're the workhorses, processing and transforming the data into a more useful format. You can have one or many hidden layers.
- Output Layer: This is where you get your final result or prediction.
Data flows through this network, starting at the input layer, going through the hidden layers, and finally arriving at the output layer. This forward movement of data is what we call Feedforward.
When we're training a neural network using something called supervised learning, we compare the network's output to the correct answers we already know (the "expected results"). The difference between what the network predicted and the correct answer is our "error".
This error signal then travels backwards through the network. This is where the magic happens – the connections between those neurons are adjusted to help the network make better predictions next time. This backward movement is called Backpropagation.
Sounds pretty simple when you break it down, right?
Let's Look Under the Hood: The Parts of a Neural Network
Okay, let's take a peek at the components. It might look a bit complex at first glance, but we'll break it down together.
In our example, we have clearly defined the three layers. Each layer has its own neurons – 2 in the input, 3 in the hidden, and 2 in the output layer.
You'll notice that each neuron in one layer is connected to every neuron in the next layer. So, a neuron in the input layer will have connections to all the neurons in the hidden layer, and the neurons in the hidden layer will connect to all the neurons in the output layer.
Each of these connections has a weight associated with it. Think of the weight as the "strength" or importance of that connection.
Also, each neuron (except for the input layer) has a bias. The weights and biases are just numbers, and they can be positive or negative. When you first create a neural network, these values are completely random.
The values that come out of the output layer are our final result.
Still with me? Great! Let's break it down even further. Yes, there will be a little bit of math, but I promise to keep it gentle.
Feedforward: The Data's Journey
When we send data into the input layer, it gets passed along to the hidden layer. How? By calculating a "weighted sum" of the inputs and then adding a bias.
What does that mean? Imagine each connection between neurons has a "strength" – that's the weight. For each neuron in the hidden layer, we take each input value, multiply it by the weight of the connection leading to that hidden neuron, and then add up all those results. Finally, we add the neuron's bias, which is like a little extra push to help the neuron activate.
Let's look at the example from the original text:
Input values: i1=1 i2=2
Weights: From i1 to hidden layer: w1=0.1, w2=−0.02, w3=0.03 From i2 to hidden layer: w4=−0.4, w5=0.2, w6=0.6
Biases on the hidden layer: b1=1 b2=−1.5 b3=−0.25
So, for the first hidden neuron (h1), the value is calculated as: h1=(i1∗w1)+(i2∗w4)+b1 h1=(1∗0.1)+(2∗−0.4)+1 h1=0.1−0.8+1 h1=0.3
We do this same calculation for every neuron in the hidden layer.
So, our hidden layer values become: h1=0.3 h2=−1.12 h3=0.98
Activation Functions: Squashing the Results
These values from the hidden layer then go through an activation function. Think of this as a way to normalize the results. There are different types of activation functions like Sigmoid, ReLU, and Linear, but for now, just know that a function is applied.
A common one is the sigmoid function, which looks like this:
f(x)=1+e−x1
Where:
- f(x) is the output of the function.
- x is the input (that weighted sum we just calculated).
- e is Euler's number (about 2.71828).
In simple terms, the sigmoid function takes any number and squashes it into a value between 0 and 1. This can be helpful if you want to interpret the output as probabilities.
The original text provided a simple code example for this:
func sigmoid(x float64) float64 {
return 1 / (1 + math.Exp(-x))
}
We don't need to worry too much about the inner workings of the code for now, just that it gives us a value between 0 and 1.
After applying the sigmoid function, our hidden layer values might look like this: h1=0.574 h2=0.245 h3=0.728
Now, we repeat the entire process: taking these new values from the hidden layer and feeding them forward to the output layer. It's worth noting that sometimes a different activation function is used for the final layer compared to the hidden layers.
Calculating the Error: How Wrong Are We?
Once we have the values from the output layer, it's time to see how well the network did. We compare the network's output to the correct answers from our training data. The difference between what the network predicted and what it should have predicted is the error. This error tells us how poorly the network performed.
We can also use these individual errors to calculate an overall average error for the network. Since these differences can be positive or negative, simply adding them up might make it look like the error is small when it's actually significant.
A common way to get around this is using the Mean Squared Error (MSE). Here's the gist:
- Calculate the difference between each predicted output and its corresponding correct value.
- Square each of these differences (this makes them all positive).
- Add up all the squared differences.
- Divide the sum by the number of data points.
The formula looks like this:
MSE=n1∑i=1n(yi−y^i)2
Where:
- MSE is the Mean Squared Error.
- n is the number of data points.
- yi is the actual (correct) value.
- y^i is the value the network predicted.
- ∑ just means "sum up".
Let's use the example from the text:
Output values: o1=0.2, o2=0.9 Expected values: 1, 0
Individual errors: For o1: 1−0.2=0.8 For o2: 0−0.9=−0.9
If we just added these, we'd get 0.8+(−0.9)=−0.1, which doesn't reflect the actual error.
Using MSE: MSE=2(1−0.2)2+(0−0.9)2 MSE=2(0.8)2+(−0.9)2 MSE=20.64+0.81 MSE=21.45=0.725
So, the network's error is 0.725. This gives us a clear picture of how far off the network's predictions were.
Backpropagation: Learning from Mistakes
Now that we know how wrong the network was (the error), we use that information to adjust the weights and biases. The goal is to make these adjustments so that the next time data flows through, the error will be smaller.
The process of updating weights and biases does involve calculus in the real world, but as the original text points out, we can understand the concept without getting into the nitty-gritty of derivatives.
Here's a simplified way to think about it:
Adjusting Weights: For each weight connecting a hidden neuron to an output neuron, we calculate how much that weight needs to change. We do this by multiplying the error signal from the output neuron by the output of the hidden neuron. We also multiply this by a small number called the "learning rate," which controls how big of a step we take in adjusting the weight. Finally, we subtract this calculated change from the current weight.
Weight Change = Error Signal * Hidden Neuron Output * Learning Rate New Weight = Old Weight - Weight Change
Updating Biases: For each bias in the output layer, we multiply the error signal of that output neuron by the learning rate and subtract it from the current bias.
Bias Change = Error Signal * Learning Rate New Bias = Old Bias - Bias Change
Backpropagating Error to Hidden Layers: To update the weights and biases in the hidden layers, we first need to figure out the "error signal" for each hidden neuron. We do this by taking a weighted sum of the error signals from the layer above (the output layer). The original text mentions multiplying this by the derivative of the activation function, which is a detail related to calculus, but the core idea is that we're distributing the error back through the network.
Once we have the error signal for the hidden neurons, we use that to update the weights and biases connecting to the hidden layer, just like we did for the output layer.
If your neural network has many hidden layers ("deep network"), you repeat this backpropagation process layer by layer, moving backward from the output all the way to the input.
And that's essentially it! As I mentioned, understanding this process doesn't necessarily require a deep understanding of calculus. Resources like the internet and Wikipedia can be incredibly helpful for finding the specific functions and details you might need.
This is one way to approach the calculations within a neural network. If you have different approaches or see areas for correction, please feel free to share in the comments – learning is a journey we're on together!