rsarv3006

It's a common theme among software engineers that we feel that our craft is sliding downhill. While AI may (or may not) be improving velocity our trade-craft is in a downward spiral. And it has been for a long time. Just look at the Windows or example or the prevalence of Electron based apps that gobble more ram than you can shake a stick at. As the ability of our systems has sky rocketed concerns around performance and memory usage have fallen off. For decade we lived by Moore's Law, that we would see doubling improvements in technology. Unfortunately in recent years this has seemed to stall.

It's time for a change. Enough is enough. We need to bring performance and memory usage back into the common vernacular. Now, my reasoning for why might be considered a hot take, but I'll get there in a second. It's been a common theme as the bubble of cloud AI has deflated somewhat, rising token costs, billing changes. AI companies and their VC backers are no longer subsidizing our token addiction. This is driving further interest in local AI models, on-device, on-prem etc. And not just for us as consumers, the big Tech Giants are also looking at this too.

This isn't just a cloud-side problem. It's bleeding into device design, OS tuning, and now, developer responsibility. If you've not programmed in a systems programming language you should try it. We sit on the shoulders of giants and we have done their legacy an absolute disservice with our casual disregard for performance and system resources.

Since the beginning Apple has leaned on their stalwart M series chips for on device inference. All the other big competitors are rapidly ramping up or have ramped up on their own NPU/AI Inference Chip powered devices. Microsoft Build introduced more AI capable devices. And one of the largest signals of this shift. And what precipitated me writing this blog: Apple. Now, hear me out, I'll be first to say that Apple's WWDC 2026 was underwhelming. The first big announcement they made was performance improvements. And I'll be honest, I'm not running the latest iPhone but I haven't noticed a lack of performance. This has been kicking around in my head for days. Why did they make such a big deal about performance if things are mostly working ok? And then it clicked... AI.

I need at least two hands to count the number of times that on device inference was mentioned in the Apple Keynote. Almost every AI feature or improvement included a line about running on-device only or on-device first. They're leaning heavily on this. And you know what doesn't work if the system is bogged down doing normal bloated device stuff? Token Generation. Microsoft Build, new devices, the main focus being on device AI. Window’s recent performance improvements. It all makes sense now.

Microsoft and Apple can't be the only ones to improve performance. Only so much can be done at the OS and kernel levels. We as software developers need to do our part. As we move forward developing AI features in our apps that rely on on-device models we need to be concerned about the performance. Even if a feature isn't directly tied to AI making performance better will help our users. We've plateaued again in terms of either performance or ram costs (pick your poison), it's time to do less with more.

Is it Time for a Performance Revival