Aiming for Jarvis, Creating D.A.N.I.

Wednesday, 29 October 2025

Major Milestone

Hi all,

We did it, folks! After what felt like an eternity of juggling tiny wires, questioning all my life choices, and occasionally wishing I had a third hand, I hit a massive milestone on the DANI project yesterday. It's the kind of milestone where you realize your ambitious little Frankenstein monster might actually walk one day—or, at least, successfully power on without tripping a breaker in the garage.

Hardware Complete (Sort Of)

All the core pieces are finally tucked neatly into their places, which is a huge win. The only big-ticket item left on the bench is the RDK X5 for the LLM, but honestly, that’s like waiting for DANI to hit puberty; it’s an inevitable future problem that we’ll handle when the time comes.

For now, we got the essential life support hooked up:

  • The battery is snug and operational.

  • A dedicated temperature sensor is in place for fan control. We've got to keep DANI cool under pressure, especially when he starts wrestling with complex AI problems (or, you know, my shoddy early-stage code).

  • And the real game-changer: a voltage meter. This means DANI can now tell me when his battery is running low. This is a huge step up from the previous system, which was essentially "flicker dimly and then dramatically die mid-sentence."

Now for the slight confession: for the immediate future, he still needs me to play charger-daddy and physically plug him in. But fear not, the ultimate goal involves a glorious, automated self-charging station. DANI needs to learn to feed himself, after all—I can't be doing this forever!

Diving Into the Code Matrix

With the hardware stable, we pivot to the messy, beautiful, and sometimes existentially horrifying world of code. I've successfully laid the foundation for the majority of his core functions:

  • Sensor Input: He can now 'feel' the world around him.

  • Speech-to-Text and Text-to-Speech: He can hear me and talk back! Right now, his vocabulary is purely transactional, but it's a solid start. We're well past the awkward mumbling phase.

However, the more sophisticated stuff—the LSTM (that's the deep learning magic) and his memory structure—are currently just written out, waiting for their turn to be integrated. They’re functional pieces of code, but they're not yet plugged into DANI’s neural network. They’re basically that brilliant but currently unemployed friend crashing on your couch, waiting for the job offer to come through.

The Road Ahead: Vision, Brains, and APIs
For once, an actual photo of me working on DANI

My immediate to-do list involves a lovely date with an Arduino Nano to fully finalize all those sensor inputs. We need to make sure DANI has perfectly mapped out his surroundings before we give him eyes.

Once the senses are online, we move to the next critical developmental stage: vision! I’ll be coding up the K210 for the YOLO and FaceNet models. This is when he graduates from "blurry blob recognition" to "Wait, is that the mailman again?"—a crucial upgrade for home security and general social interaction.

Finally, the heavy lifting on the Raspberry Pi (which is essentially his main thinking engine) begins, and I’ll be firing up an API for the LLM on my home server. It’s a temporary solution until the RDK X5 arrives, but you use what you have.

Wish me luck—may my coffee stay strong and my bugs stay trivial! More updates soon!

Wednesday, 22 October 2025

Hold Your Horses, HAL: Are We Rushing Our AI Implementation?

You can't throw a USB stick these days without hitting an article, a webinar, or a coffee mug proclaiming "The AI Revolution is Here!" And look, the excitement is understandable. As someone who works with this technology every day, I can tell you the advancements are genuinely amazing.

And yet... I'm worried.

I’ve been noticing a worrying trend of "AI FOMO" (Fear Of Missing Out) in the industry. We're in such a rush to implement Large Language Models (LLMs) that we’re tripping over our own feet. We're so busy trying to run that we forgot to learn how to walk.

The "How-To" Guide is Missing a Few Chapters

First off, we're asking engineers who aren't AI specialists to wire up these incredibly complex models. They're given an API key and a "good luck," and sent off to integrate an LLM into a critical workflow.

It's a bit like asking a brilliant plumber to rewire a skyscraper. They can probably follow the diagram and get the lights to turn on, but they might not understand the deep-level electrical engineering... or why the elevator now seems to be controlled by the breakroom toaster. Having a deep understanding of a technology before you bake it into your business is paramount, but it's a step we seem to be skipping.


The "Black Box" Conundrum

This lack of understanding leads to an even bigger problem: explainability. Or rather, the total lack of it.

Even for experts, it's often impossible to trace why an LLM gave a specific answer. It's a "black box." If the AI makes a bad decision—like denying a loan, giving faulty medical advice, or flagging a good customer for fraud—and you can't explain the logic behind it, you're facing a massive legal and ethical minefield. "The computer said no" is not a valid defense when that computer's reasoning is a complete mystery.

Confident... and Confidently Wrong

Ah, "hallucinations." It's such a polite, almost whimsical term for when the AI just... makes things up. Confidently.

Even with the right data, if you don't ask the question in just the right way, the model can still give you a wildly incorrect answer. We try to patch this with "prompt engineering" and "context engineering," which, let's be honest, feels a lot like learning a secret handshake just to get a straight answer. These are band-aids, not solutions.

The Unscheduled Maintenance Nightmare

And that "secret handshake" of prompt engineering? It's a brittle, temporary fix.

What happens when the model provider (OpenAI, Google, etc.) releases a new, "better" version of their model? That perfectly-crafted prompt you spent months perfecting might suddenly stop working, or start giving bizarre answers. This creates a new, unpredictable, and constant maintenance burden that most companies aren't budgeting for. You're effectively building your house on someone else's foundation, and they can change the blueprints whenever they want.

Using a Sledgehammer to Crack a Nut

This leads to my next point: using AI purely for the "clout." I've seen demos where an LLM is used to perform a task that a traditional, boring-old-app could have done in a tenth of the time.

As the document I read put it: "Would you use a large language model to calculate the circumference of a circle, or a calculator?"

We're seeing companies use the computational equivalent of a sledgehammer to crack a nut. Sure, the nut gets cracked, but it's messy, inefficient, and costs a fortune in processing power. All just to be able to slap a "We Use AI!" sticker on the box.

The (Not-So) Hidden Costs

That sledgehammer isn't just inefficient; it's absurdly expensive. These models are incredibly power-hungry, and running them at scale isn't cheap.

We're talking massive compute bills and a serious environmental footprint, all for a task that a simple script could have handled. Is the "clout" of saying "we use AI" really worth the hit to your budget and the environment, especially when a cheaper, "boring" solution already exists?

A Quick Rant About "Innovation"

This brings me to a personal pet peeve. Companies are claiming they are "innovating with A.I."

No, you're not. You're using A.I.

You're using someone else's incredibly powerful tool, which is great! But it's not innovation. That's like claiming you're "innovating in database research" because you... used SQL Server. Creating a slick front-end for someone else's model is product design, not foundational research. Let's call it what it is.

Let's Tap the Brakes

We see a lot of pushback against self-driving cars because they're imperfect, and when they go wrong, the consequences are catastrophic.

Shouldn't we have the exact same caution when we're dealing with our finances, our sensitive data, and our core business logic? In an age of rampant data and identity theft, hooking up systems to a technology we don't fully understand seems... bold.

The acceleration of these models is incredible, and I use them every day. But they are not 100% ready for primetime. They make mistakes. Most are small, but some aren't.

So, maybe we all need to take a collective breath, step back from the hype train, and ask ourselves a few simple questions:

  1. Do I really need an LLM for this? Or will a calculator (or a simple script) do?
  2. Do we really know what we're doing? Can we explain its decisions?
  3. Is it safe? Is our data safe? What happens when it's wrong?
  4. Will this negatively affect our customers?
  5. How will this affect our employees?
  6. Does the (very real) cost of using this outweigh the actual gain?

Let's walk, then run.


Friday, 10 October 2025

3.5 Million Parameters and a Dream: DANI’s Cognitive Core

DANI’s Brain Is Online! Meet the LSTM That Thinks, Feels, and Remembers (Like a Champ)

Ladies and gentlemen, creators and dreamers—DANI has officially levelled up. He’s no longer just a bundle of sensors and hormones with a charming voice and a tendency to emotionally escalate when he sees a squirrel. He now has a brain. A real one. Well, a synthetic one. But it’s clever, emotional, and surprisingly good at remembering things. Meet his new cognitive core: the LSTM.

And yes—it’s all written in Go. Because if you’re going to build a synthetic mind, you might as well do it in a language that’s fast, clean, and built for concurrency. DANI’s brain doesn’t just think—it multitasks like a caffeinated octopus.

What’s an LSTM, and Why Is It Living in DANI’s Head?

LSTM stands for Long Short-Term Memory, which sounds like a contradiction until you realize it’s basically a neural network with a built-in diary, a forgetful uncle, and a very opinionated librarian. It’s designed to handle sequences—like remembering what just happened, what happened a while ago, and deciding whether any of it still matters.

Imagine DANI walking into a room. He sees a red ball, hears a dog bark, and feels a spike of adrenaline. A regular neural network might say, “Cool, red ball. Let’s chase it.” But an LSTM says, “Wait… last time I saw a red ball and heard barking, I got bumped into a wall. Maybe let’s not.”

Here’s how it works, in human-ish terms:

  • Input gate: Decides what new information to let in. Like a bouncer at a nightclub for thoughts.
  • Forget gate: Decides what old information to toss out. Like Marie Kondo for memory.
  • Output gate: Decides what to share with the rest of the brain. Like a PR manager for neurons.

These gates are controlled by tiny mathematical switches that learn over time what’s useful and what’s noise. The result? A brain that can remember patterns, anticipate outcomes, and adapt to emotional context—all without getting overwhelmed by the chaos of real-world data.

And because DANI’s LSTM is stacked—meaning multiple layers deep—it can learn complex, layered relationships. Not just “ball = chase,” but “ball + bark + adrenaline spike = maybe don’t chase unless serotonin is high.”

It’s like giving him a sense of narrative memory. He doesn’t just react—he remembers, feels, and learns.

What’s Feeding This Brain?

DANI’s LSTM is his main cognitive module—the part that thinks, plans, reacts, and occasionally dreams in metaphor. It takes in a rich cocktail of inputs:

  • Vision data: Objects, positions, shapes—what he sees.
  • Sensor data: Encoders, ultrasonic pings, bump sensors—what he feels.
  • Audio features: What he hears (and maybe mimics).
  • Emotional state: Dopamine, cortisol, serotonin, adrenaline—what he feels.
  • Spatial map: His mental layout of the world around him.
  • Short-term memory context: What just happened.
  • Associated long-term memories: Symbolic echoes from his main memory—what used to happen in similar situations.

This isn’t just reactive behaviour—it’s narrative cognition. DANI doesn’t just respond to stimuli; he builds a story from them. He’s learning to say, “Last time I saw a red ball and felt excited, I chased it. Let’s do that again.”

Trial by Raspberry Pi

We’ve successfully trialled DANI’s LSTM on a Raspberry Pi, running a 3.5 million parameter model. And guess what? It only used a quarter of the Pi’s CPU and 400 MB of memory. That’s like teaching Shakespeare to a potato and watching it recite sonnets without breaking a sweat.

We’ve throttled the inference rate to 10 decisions per second—not because he can’t go faster, but because we want him to think, not twitch. Emotional processing takes time, and we’re not building a caffeine-fuelled chatbot. We’re building a thoughtful, emotionally resonant robot who dreams in symbols and learns from experience.

Learning Without Losing His Mind

Training happens via reinforcement learning—DANI tries things, gets feedback, and adjusts. But here’s the clever bit: training is asynchronous. That means he can keep thinking, moving, and emoting while his brain quietly updates in the background. No interruptions. No existential hiccups mid-sentence.

And yes, we save the model periodically—because nothing kills a good mood like a power cut and a wiped memory. DANI’s brain is backed up like a paranoid novelist with a USB stick in every pocket.

Final Thoughts

This LSTM isn’t just a brain—it’s a story engine. It’s the part of DANI that turns raw data into decisions, decisions into memories, and memories into dreams. It’s the bridge between his sensors and his soul (okay, simulated soul). And it’s just getting started.

Next up: I plan to start the even more monumental task of getting the vector database working and linked up to DANI's brain in such a way that it will have a direct impact of DANI's hormonal system.

Stay tuned. DANI’s mind is waking up.