I already posted my thoughts on AI and why I don’t think it’s going away any time soon. Unfortunately, it seems some people who don’t like LLMs are using AI-induced outages and deletions as an opportunity to reaffirm their biases, and, in doing so, may be missing part of the picture.
Much has been written on the risks of AI adoption, and many articles have called out the risks AI poses to businesses. However, I think people are too quick to draw the wrong conclusions from these incidents.
It happened to:
- PocketOS [archived version]
- Replit [archived version]
- Some user in Cursor forums [archived version]
- Even AWS [archived version]
In most mature organisations with modern engineering practices (for example, Google, Uber, and some neobanks), engineers have very limited access to production. In some cases, no access at all, and in other cases they have a replica of the production environment that they can test against; if they manage to break this production replica, customer data is not lost, and customers are not impacted by downtime.
Unfortunately, this is not always the case. Not even in prestigious Big Tech organisations (like AWS itself, as you can see).
Insider threats and you
AI agents should be treated like insider-risk actors: powerful, useful, and potentially dangerous if given excessive permissions. Whether the failure comes from malice, carelessness, compromised credentials, or an over-eager agent, the engineering problem is the same: the system must tolerate bad actions from trusted actors.
SRE and security
As a Site Reliability Engineer, security is part of the job. SREs handle concepts like guardrails, replication, backups, Service-Level Agreements, business continuity, and more. Two of the most important concepts in Site Reliability Engineering are the “nines” in an SLI/SLO/SLA [archived version], and being “blameless” [archived version].
Outages
It is impossible to be 100% available, because this implies there can never be any outage, and this is not possible for complex interconnected systems. This can occur at any time: from earthquakes, nuclear power plant meltdowns, total nationwide blackout, or even aerial attacks targeting your datacenter—as it happened to AWS sites in UAE—and it is often impossible to avoid these events because they are out of your control. It is, however, possible to plan and work around these potential threats by making sure your architecture is able to recover from different kinds of disasters.
The cause of the outage is not important. What matters is the outage occurring at all. AI is just one more vector.
Blameless
“Blameless” culture is not just about avoiding pointing fingers after an outage. It is about engineering systems that account for user error. Nobody should be able to click a button or run a command and bring down your business, whether they are an external attacker, an internal employee, or an AI agent acting with valid credentials.
This is achieved through layers of defence in depth, backups, and by adding friction to deployments and development. No developer should have easy access to a production database, especially if you handle sensitive customer data (GDPR, PCI), or if you are a large publicly-listed organisation with financial obligations and reporting requirements (SOx).
If a developer does bring down production, the response should not be to scold them. Recover the systems, then investigate the failure. Why did a single user manage to bring down the system? What specific steps allowed this to happen? How can we prevent this class of outage?
Anything a person can do, AI can do just as well. In an organisation with strict Site Reliability Engineering principles, this should not be possible. If it does happen, treat it as a learning opportunity rather than dismissing AI as ineffective and dangerous.
Conclusion
If an agent can delete your production database, then you need to reconsider your security posture. The answer is to design production systems where no single actor has enough unchecked power to destroy the business.
It is your responsibility to design secure, reliable systems that can withstand user error, AI hallucinations, and motivated adversaries. The mitigations are largely the same.
So, to answer the question:
Why did AI destroy my production database?
Because you gave it credentials, reachability, and insufficient blast-radius controls.