Aiming for Jarvis, Creating D.A.N.I.

Sunday, 20 April 2025

Demystifying Neural Networks: A Beginner's Friendly Guide

Hey there!

This week, I want to dive into something that might sound a bit intimidating at first: neural networks.

I know, I know. Just the phrase "neural networks" can bring to mind complex equations and head-scratching calculus. But trust me, it doesn't have to be that way! I want to share how I came to understand these fascinating systems, and hopefully, make it click for you too.

You see, I did my high school in England back in the 80s. And guess what wasn't on the curriculum? Calculus. While I might have been happy about it back then, it's definitely presented some interesting challenges when trying to get a grip on machine learning concepts today.  

So, I had to find a way to understand how neural networks work under the hood without getting bogged down in derivatives and the chain rule. And that's exactly what I want to share with you now.  

So, How Does a Neural Network Actually Work?

Think of a neural network like a team of interconnected nodes, or "neurons," organized in layers. At a minimum, you'll usually see three types of layers:  

  • Input Layer: This is where your raw data comes in. It's the starting point of the journey for your information.  
  • Hidden Layer(s): These are the layers in between the input and output. They're the workhorses, processing and transforming the data into a more useful format. You can have one or many hidden layers.  
  • Output Layer: This is where you get your final result or prediction.  

Data flows through this network, starting at the input layer, going through the hidden layers, and finally arriving at the output layer. This forward movement of data is what we call Feedforward.  

When we're training a neural network using something called supervised learning, we compare the network's output to the correct answers we already know (the "expected results"). The difference between what the network predicted and the correct answer is our "error".  

This error signal then travels backwards through the network. This is where the magic happens – the connections between those neurons are adjusted to help the network make better predictions next time. This backward movement is called Backpropagation.  

Sounds pretty simple when you break it down, right?

Let's Look Under the Hood: The Parts of a Neural Network

Okay, let's take a peek at the components. It might look a bit complex at first glance, but we'll break it down together.  



In our example, we have clearly defined the three layers. Each layer has its own neurons – 2 in the input, 3 in the hidden, and 2 in the output layer.  

You'll notice that each neuron in one layer is connected to every neuron in the next layer. So, a neuron in the input layer will have connections to all the neurons in the hidden layer, and the neurons in the hidden layer will connect to all the neurons in the output layer.  

Each of these connections has a weight associated with it. Think of the weight as the "strength" or importance of that connection.  

Also, each neuron (except for the input layer) has a bias. The weights and biases are just numbers, and they can be positive or negative. When you first create a neural network, these values are completely random.  

The values that come out of the output layer are our final result.  

Still with me? Great! Let's break it down even further. Yes, there will be a little bit of math, but I promise to keep it gentle.  

Feedforward: The Data's Journey

When we send data into the input layer, it gets passed along to the hidden layer. How? By calculating a "weighted sum" of the inputs and then adding a bias.  

What does that mean? Imagine each connection between neurons has a "strength" – that's the weight. For each neuron in the hidden layer, we take each input value, multiply it by the weight of the connection leading to that hidden neuron, and then add up all those results. Finally, we add the neuron's bias, which is like a little extra push to help the neuron activate.  

Let's look at the example from the original text:

Input values: i1=1 i2=2

Weights: From i1 to hidden layer: w1=0.1, w2=−0.02, w3=0.03 From i2 to hidden layer: w4=−0.4, w5=0.2, w6=0.6

Biases on the hidden layer: b1=1 b2=−1.5 b3=−0.25

So, for the first hidden neuron (h1), the value is calculated as: h1=(i1∗w1)+(i2∗w4)+b1 h1=(1∗0.1)+(2∗−0.4)+1 h1=0.1−0.8+1 h1=0.3  

We do this same calculation for every neuron in the hidden layer.  

So, our hidden layer values become: h1=0.3 h2=−1.12 h3=0.98  

Activation Functions: Squashing the Results

These values from the hidden layer then go through an activation function. Think of this as a way to normalize the results. There are different types of activation functions like Sigmoid, ReLU, and Linear, but for now, just know that a function is applied.  

A common one is the sigmoid function, which looks like this:  

f(x)=1+e−x1​  

Where:

  • f(x) is the output of the function.  
  • x is the input (that weighted sum we just calculated).  
  • e is Euler's number (about 2.71828).  

In simple terms, the sigmoid function takes any number and squashes it into a value between 0 and 1. This can be helpful if you want to interpret the output as probabilities.  

The original text provided a simple code example for this:


func sigmoid(x float64) float64 {

  return 1 / (1 + math.Exp(-x))

}


We don't need to worry too much about the inner workings of the code for now, just that it gives us a value between 0 and 1.  

After applying the sigmoid function, our hidden layer values might look like this: h1=0.574 h2=0.245 h3=0.728  

Now, we repeat the entire process: taking these new values from the hidden layer and feeding them forward to the output layer. It's worth noting that sometimes a different activation function is used for the final layer compared to the hidden layers.  

Calculating the Error: How Wrong Are We?

Once we have the values from the output layer, it's time to see how well the network did. We compare the network's output to the correct answers from our training data. The difference between what the network predicted and what it should have predicted is the error. This error tells us how poorly the network performed.  

We can also use these individual errors to calculate an overall average error for the network. Since these differences can be positive or negative, simply adding them up might make it look like the error is small when it's actually significant.  

A common way to get around this is using the Mean Squared Error (MSE). Here's the gist:  

  1. Calculate the difference between each predicted output and its corresponding correct value.  
  2. Square each of these differences (this makes them all positive).  
  3. Add up all the squared differences. 
  4. Divide the sum by the number of data points.  

The formula looks like this:

MSE=n1​∑i=1n​(yi​−y^​i​)2  

Where:

  • MSE is the Mean Squared Error.  
  • n is the number of data points.  
  • yi​ is the actual (correct) value.  
  • y^​i​ is the value the network predicted.  
  • ∑ just means "sum up".  

Let's use the example from the text:

Output values: o1=0.2, o2=0.9 Expected values: 1, 0

Individual errors: For o1: 1−0.2=0.8 For o2: 0−0.9=−0.9  

If we just added these, we'd get 0.8+(−0.9)=−0.1, which doesn't reflect the actual error.  

Using MSE: MSE=2(1−0.2)2+(0−0.9)2​ MSE=2(0.8)2+(−0.9)2​ MSE=20.64+0.81​ MSE=21.45​=0.725  

So, the network's error is 0.725. This gives us a clear picture of how far off the network's predictions were.

Backpropagation: Learning from Mistakes

Now that we know how wrong the network was (the error), we use that information to adjust the weights and biases. The goal is to make these adjustments so that the next time data flows through, the error will be smaller.  

The process of updating weights and biases does involve calculus in the real world, but as the original text points out, we can understand the concept without getting into the nitty-gritty of derivatives.  

Here's a simplified way to think about it:  

Adjusting Weights: For each weight connecting a hidden neuron to an output neuron, we calculate how much that weight needs to change. We do this by multiplying the error signal from the output neuron by the output of the hidden neuron. We also multiply this by a small number called the "learning rate," which controls how big of a step we take in adjusting the weight. Finally, we subtract this calculated change from the current weight.  

Weight Change = Error Signal * Hidden Neuron Output * Learning Rate New Weight = Old Weight - Weight Change  

Updating Biases: For each bias in the output layer, we multiply the error signal of that output neuron by the learning rate and subtract it from the current bias.  

Bias Change = Error Signal * Learning Rate New Bias = Old Bias - Bias Change  

Backpropagating Error to Hidden Layers: To update the weights and biases in the hidden layers, we first need to figure out the "error signal" for each hidden neuron. We do this by taking a weighted sum of the error signals from the layer above (the output layer). The original text mentions multiplying this by the derivative of the activation function, which is a detail related to calculus, but the core idea is that we're distributing the error back through the network.  

Once we have the error signal for the hidden neurons, we use that to update the weights and biases connecting to the hidden layer, just like we did for the output layer.  

If your neural network has many hidden layers ("deep network"), you repeat this backpropagation process layer by layer, moving backward from the output all the way to the input.  

And that's essentially it! As I mentioned, understanding this process doesn't necessarily require a deep understanding of calculus. Resources like the internet and Wikipedia can be incredibly helpful for finding the specific functions and details you might need.  

This is one way to approach the calculations within a neural network. If you have different approaches or see areas for correction, please feel free to share in the comments – learning is a journey we're on together!  


Friday, 11 April 2025

Diving Deep into Efficient Messaging Systems: My Journey with Polestar

I thought it was time to share some insights into a key part of a project I’ve been working on. It's not the whole codebase – I wouldn't want to bore you to tears! But I do want to talk about some of the core concepts I've implemented.   


One of the critical requirements of this project was building a messaging system capable of handling a potentially massive throughput of messages and ensuring they reach their intended destinations efficiently.   


Now, I could have gone with off-the-shelf solutions like ROS (Robot Operating System). However, I'm a bit of a control freak and enjoy crafting things from the ground up.   


That's how Polestar was born.   

Polestar


Polestar is a custom library designed to handle messages composed of maps (or dictionaries) containing primitive data types. Think strings, integers, floats, and booleans. These messages are published to Polestar with a specific topic, and any application subscribed to that topic receives a copy.   


My initial attempt at building this system was decent enough, achieving a throughput of about 800 messages per second. But I started thinking about how I could push the boundaries, enhance the throughput, and make the system even more robust.   


And guess what? I did it! I managed to crank up the throughput to an impressive 16,000 messages per second. That's more than sufficient for any scenario I can currently envision.   


To maintain efficiency, if the message queue is full when a new message arrives, the message is dropped to prevent blocking the processes.  Considering the queue's substantial capacity of 1,000,000 messages, this scenario should be quite rare.   


The Queue Conundrum


But here's where it gets interesting.  Recently, I started pondering: what if, instead of dropping the newest message when the queue is full, we dropped the oldest message?  How difficult would that be to implement?    


Go's channels, in their default state, don't offer this specific behavior. However, as is often the case in programming, there are multiple ways to achieve it.   


One approach involves creating a struct that encapsulates a queue (as a slice) and uses a single channel. But this felt like overkill for such a small feature. Plus, I'd lose the inherent speed advantages of Go's channels.   


So, I devised what I believe is a more elegant solution. It leverages the fundamental nature of channels and preserves the ability to iterate over them in the standard way.   


Go's flexibility allows you to create a new type based on an existing type, even a primitive one. In this case, I created a new type called ch based on a channel of strings:   


type ch chan string


This opens the door to using Go's method functionality to add a custom behavior to our new type.  I created a Send method with the following logic:   


// Send attempts to send a message to the channel.

// If the channel is full,

// it drops the oldest message and tries again.

// Returns a boolean indicating

// whether a message was dropped (true) and an error if the operation failed.

// The error is non-nil only if the channel remains full after attempting to

// drop the oldest message.

func (c ch) Send(msg string) (bool, error) {

  select {

  case c <- msg:

    return false, nil

  default:

    // Channel is full, drop the oldest and try again

    <-c // Discard oldest

    select {

    case c <- msg:

      // Message sent after dropping oldest

      return true, nil

    default:

      //This should rarely, if ever, happen.

      //Handle error/log message.

      return true, errors.New("Error: Channel still full after dropping oldest.")

    }

  }

}

This Send method replaces the typical channel send operation:


chVar <- “hello”


with:


chVar.Send(“hello”)


The beauty of this is that if you've created a buffered channel, the oldest item in the queue is dropped when the queue is full. This can be incredibly useful in scenarios like robotics, where outdated messages might lose their relevance, and prioritizing the latest information is crucial.   


I haven't integrated this into Polestar just yet. I'm still weighing the pros and cons of dropping the newest versus the oldest message.  Ideally, of course, no messages would be dropped at all.   


To give you a glimpse of Polestar's speed, here's a short video of one of the test runs:




My original plan involved using a hardware hub for this project. However, I don't believe I could have achieved this level of performance with a microcontroller (MCU), especially considering the queue size.  Polestar's heavy use of concurrency would also pose a challenge for microcontrollers.   


The trade-off is that all communication now relies on TCP instead of serial. Serial communication might have offered faster data transmission with less overhead, but the routing complexities would have been a significant hurdle.   


I hope this deep dive into my process provides some food for thought, especially for fellow developers. And for those who aren't knee-deep in code, I hope it offers a little peek into how my mind works.   


I welcome any comments or questions you might have. Please feel free to leave them in the comments section below!    

Thursday, 3 April 2025

Vibe Coding: The Future of Programming or Just a Fun Experiment?


Heard the latest buzzword in the tech world? It's "Vibe Coding". When I first encountered the term, my mind instantly pictured a programmer just winging it, letting the code flow wherever the digital current took them, maybe like a novelist surprised by their own characters. I'll admit, I've had moments like that – a vague goal in mind and just… coding.   

But, as it turns out, that initial guess was off the mark. So, what is vibe coding? According to the collective wisdom of Wikipedia:   

Vibe coding (also vibecoding) is an AI-dependent programming technique where a person describes a problem in a few sentences as a prompt to a large language model (LLM) tuned for coding. The LLM generates software, shifting the programmer's role from manual coding to guiding, testing, and refining the AI-generated source code. Vibe coding is claimed by its advocates to allow even amateur programmers to produce software without the extensive training and skills required for software engineering”    

Essentially, you tell an AI what you want, and poof, it generates the code. The human becomes less of a manual coder and more of a guide, tester, and refiner. Sounds pretty cool, right? Maybe even revolutionary?   

The Allure and the Alarm Bells

I can definitely see the appeal. It sounds fun, potentially lowering the barrier to entry for software creation and offering a fascinating avenue for exploring AI capabilities. Imagine describing an app idea and having a functional starting point within minutes!   

However, based on my experience and reading, I'm not convinced it's a truly viable solution just yet. Why the hesitation?   

It Often Doesn't "Just Work": Getting AI-generated code that runs correctly the first time seems to be the exception, not the rule. It often takes several tries, tweaking prompts to get something functional.   

Functionality vs. Intent: Even if the code runs, does it actually do what you intended? That's another hurdle where luck plays a big role.   

The Amendment Nightmare: Here's the real kicker for me: trying to modify or fix AI-generated code. If you stick with vibe coding, you could end up in an endless loop of refining prompts for a single feature. Try to dive in manually? You might find code that, while functional, is baffling, overly rigid because it stuck too literally to your prompt, or just plain inefficient.   

So, Where Do We Stand?

Vibe coding is undeniably intriguing. As a tool for rapid prototyping, learning, or exploring AI's coding prowess, it has potential. But relying on it for serious development seems fraught with challenges, particularly when it comes to refinement and maintenance.   

Perhaps it's less about replacing traditional coding and more about augmenting it – a powerful assistant, but one whose work needs careful scrutiny and often, significant manual intervention.

What are your thoughts? Have you tried vibe coding? Is it the future, a fleeting trend, or something in between? Let me know in the comments!