GPT-5 The Be-All and End-All? – Johannes Neubauer

On Friday, OpenAI released GPT-5. The release triggered a wave of demonstration videos in which YouTubers showcase GPT-5’s programming skills in pair programming and vibe coding. Many of these examples revolve around creating simple, well-known games – things like flight simulators, Snake, or Angry Birds variants. Such projects exist in hundreds of versions online and are likely also in the AI’s training data. In these cases, 95–99% of the work is handled by existing libraries and frameworks.

Hence, I immediately put it to work in a small home automation project, instead. I wanted to choose an example where it wasn’t about graphics or rendering, but about cleanly implementing logic with some complexity – something that doesn’t exist in identical form all over the internet. An obvious candidate was a small but clear-cut scenario: controlling a small above-ground pool with a pump.

Before my last vacation, I had built a controller for this with GPT-4o and the Thinking Model o4-Mini in Home Assistant. The goal was to run the pump for at least one hour and at most two hours a day, primarily using surplus solar power. If by 10 p.m. the minimum runtime had not been reached, the remainder should run on grid power to ensure water quality.

The implementation took just a few prompts: a virtual helper counted the cumulative runtime of the pump per day, and two automations handled switching on and off. The helper reset the counter to zero every night. This setup worked flawlessly during the vacation.

After our return, the new requirement was to also be able to turn the pump on and off manually, regardless of the daily logic – for example, right after adding chlorine. The existing automation, however, would turn the pump off every minute once the maximum runtime was reached, even if it had been started manually. Needless to say, this was not acceptable by other family members; had a low Family Acceptance Factor (FAF).

I asked GPT-5 to extend the logic so that it could distinguish between a manual and an automatic start. Without me ever asking for it, GPT-5’s first solution was to add a second virtual button for manual override. Two buttons for the same socket are a guaranteed FAF killer. Only after I gave more precise instructions did GPT-5 implement a flag that would be set when the pump was started automatically and reset when the user turned it off.

However, this change broke the original functionality: when solar power fluctuated, the pump would now shut off prematurely even during automatic starts, as soon as the flag was reset. GPT-5 didn’t detect the contradiction on its own.

Later that day, I noticed the pump switching off despite a 400 W solar surplus. GPT-5 had understood my requirement to only turn the pump on at a surplus of at least 500 W but had used the same threshold for switching it off. This meant there was no hysteresis. A more sensible approach would have been to switch off at a much lower surplus (e.g., 0–100 W), since the pump’s consumption was already factored into the total usage. Despite my fairly clear wording, I had to spot and point out this issue myself before GPT-5 corrected it.

When I instructed GPT-5 to add the hysteresis, it did so. But when I then asked whether there were any other edge cases, GPT-5 thought for several minutes and finally admitted that, because of the changes, it had accidentally removed the 10 p.m. fallback logic. Now, if the pump had run less than one hour by that time, it would only start if there was a surplus of at least 500 W – grid fallback was gone. GPT-5 even told me I had “shot myself in the foot” with this, whereas I believe it’s the AI’s job to spot such conflicts and either find a solution that meets all requirements or explicitly flag the problem.

When it came to fixing the fallback, GPT-5 kept the original separation between automations and added yet another flag for the 10 p.m. period, which one automation would set for the other to prevent shutdowns during that time. To me, this was overly complicated and hard to maintain – the kind of thing you might call “through the chest into the eye.” Even after multiple suggestions to merge both logics into one automation, GPT-5 initially replied that this would be “too complicated.” Only after very explicit instructions did it merge the logic. The result was a single automation just six lines longer but far simpler and much easier to maintain.

Conclusion

The “the be-all and end-all” has not yet been found. The tools are becoming more powerful, and there are interesting developments – for example in tools like Cursor or in agent mode, where the AI writes its own tests, runs the system, feeds back errors, and iterates in a loop until it works. This approach, known as Divide & Conquer, breaks a problem into many small subproblems and solves them step by step.

The human time required drops significantly, but the mental demands rise just as sharply. In the future, what will be needed is less a pure coder and more of a “swiss army knife” (cf. german: egg-laying wool-milk-pig) combining the skills of a Requirements Engineer, Software Architect, Enterprise Architect, Solution Architect, and Test Manager – someone who can orchestrate an entire zoo of AI agents to produce working software.

And it’s worth remembering: this was not a complex project, just a simple automation with a few edge cases – yet GPT-5 still managed to introduce several logic gaps that had to be found and fixed manually.

Exciting 🤓;

Conclusion

Leave a Reply Cancel reply