Wow, it’s been a while. Apologies for the radio silence, but the pesky "real world" caught up with me, and I had to spend some time doing that whole "working for a living" thing.
Anyway, enough about the mundane. Let's get back to what is actually important: DANI.
As you might remember, my ultimate, beyond-my-wildest-dreams goal with this project is to cross that threshold and meet the definition of when a robot is actually alive, or at least close to it. But recently, while pondering DANI’s LSTM (the fancy Long Short-Term Memory neural network that acts as his brain), I realized I had made a fundamental—and slightly embarrassing—mistake.
It’s hard to achieve sentience when your robot has the memory retention of a goldfish.
The Problem: Scheduled Blackouts
As it stands right now, DANI "thinks" every 100 milliseconds, giving him 10 thought cycles a second. Every 10 seconds (100 cycles), backpropagation kicks in to train the network. To do this concurrently without stopping DANI in his tracks, I clone the LSTM at that exact moment, run the heavy backpropagation math on the clone, and then overwrite the active LSTM with the newly trained clone.
This backpropagation takes about 2 to 3 seconds. My initial thought was: Brilliant! The training happens in the background without interrupting his flow.
But there is a glaring flaw.
Because the process takes a snapshot, spends 3 seconds learning from it, and then violently overwrites the active brain... we lose those 2 to 3 seconds of short-term memory that DANI experienced while the training was happening. Every 10 seconds, DANI essentially blacks out and forgets the last few seconds of his existence. This is seriously hampering his learning capabilities.
How do we stop DANI from becoming a chronic amnesiac?
The Fix: A Neurological Hot-Swap
My solution is to ditch the cloning process entirely. Instead, each neuron will now have two sets of weights: one active, one inactive.
During backpropagation, the inactive weights will get the results of the calculation (using the active weights for the algorithm). This allows us to update the LSTM's underlying math without wiping out the actively evolving memory states (the cell states and hidden states) that DANI is currently using to understand the world. We just add a flag to each layer to indicate whether it should be reading from Weight Set 1 or Weight Set 2.
But wait, there’s more!
Reshaping the Brain
At present, DANI's model has about 300 neurons on each layer, with 5 layers (I don’t have the code right in front of me, so I'm relying on my own somewhat flawed, non-LSTM memory here).
If we increase the number of layers but reduce the neurons per layer, we can implement a rolling update. This means DANI can immediately benefit from the training layer-by-layer, even while the rest of the brain is still calculating.
What this entails is increasing the layer count to 7 (any higher and we start flirting with the dreaded vanishing gradient problem), but reducing the neuron count, per layer, to 128 (because who doesn't love a nice power of 2?).
This gives DANI a much more focused, "deep" thought process, allowing him to break down problems more efficiently. It also allows us to gracefully ‘flip the switch’ on each layer as we cycle through.
Here is how the rolling update will work:
As each feed-forward pass occurs (DANI thinking), a check is done to see if the next layer is ready to have its switch flipped to the newly trained weights. Because backpropagation is strictly sequential and works backwards, we start checking from the last layer and move towards the first.
If a layer is ready, we flip the weights to the newly trained set and mark it as done. On the next thought cycle, we check the next layer, and so on, until we reach the front of the brain. Then, we start the whole process over again.
What do we gain from this brain surgery?
Quite a bit, actually:
- Constant Learning: The LSTM is in a state of continuous, uninterrupted learning.
- Stable Learning Rate: No massive, sudden shifts in logic.
- Smoother Processing: No sudden CPU spikes from cloning and overwriting massive arrays.
- Deeper Thinking: The structural change to 7 layers gives DANI a more nuanced, layered way of processing information.
- Memory Retention: We actually retain the states of the memory gates within the LSTM. No more 3-second blackouts!
There are certainly other ways to create a continuous neural network, but I am aiming for the absolute simplest solution here. Remember, all of this is running on a Raspberry Pi!
This dual-weight method does increase the memory required to hold the LSTM, but because we are reducing the overall neuron count from ~1500 (5x300) to 896 (7x128), it's actually going to be lighter on the Pi overall. DANI had an oversized network anyway, so trimming the fat while adding depth is a win-win.
What do you guys think of this approach? Let me know in the comments if you see any potholes I'm about to step in!

No comments:
Post a Comment