Smart companies, dumb AI mistakes. What three high-profile rollbacks revealed. · IntelScroll Signal

Air Canada deployed a customer-service chatbot. The chatbot told Jake Moffatt he could apply for a bereavement-fare refund retroactively. Air Canada policy required pre-booking application. Moffatt booked, applied, was denied, and went to the BC Civil Resolution Tribunal. Air Canada argued the chatbot was a separate legal entity from the airline. The tribunal disagreed. The ruling is now cited as precedent: deployed AI output is treated as company policy, regardless of internal disclaimers, regardless of how the output was generated.

Klarna spent 2024 celebrating that AI was doing the work of 700 customer-service humans. The press cycle was favorable. May 2025: Klarna walked it back. The framing was 'we are hiring humans again for quality reasons.' The underlying story was that customer satisfaction had not held up, complex tickets the AI handled poorly were creating reputational damage that exceeded the per-ticket cost savings, and the gap between the external claim and the actual capability had become a story in itself. The unit economics may have been positive; the reputational economics were not.

Cursor sold an unlimited subscription tier for its AI-powered editor. When the product was chat-style autocomplete (1 LLM call per interaction), the math worked. When Cursor added agent mode (20-100 LLM calls per request), the top 5 percent of users by usage consumed 50 percent of the agentic workload, and the unit economics broke. Late 2025, Cursor moved to usage-based pricing. The transition was contentious. The previous model was structurally unprofitable at any subscription price they could reasonably charge.

The model in each case was capable. The failure was elsewhere — in scope, in claims, in pricing, in escape hatches, in eval, in governance.

Across all three — and across the other documented rollbacks (McDonalds + IBM drive-thru, DPD chatbot, Sports Illustrated AI bylines, iTutor Group EEOC settlement) — the model is almost never the root cause. The model in each case was capable. The failure was elsewhere: in Air Canada, no policy-conflict detection and no human-review queue for outputs touching contractual terms. In Klarna, external claims that exceeded internal capability. In Cursor, a pricing model designed for a chat workload and applied to an agentic one. In DPD, no prompt-injection defense and no kill switch when the chatbot started swearing on X. In each, the model behaved roughly as a capable model should; the surrounding deployment did not.

The pattern that survives versus the pattern that rolls back is identifiable. Bounded scope vs unbounded. Calibrated external claims vs aspirational. Pricing model matched to workload shape vs flat-rate against agentic. Escape hatches shipped day one vs retrofitted after the incident. The companies in the rollback list violated at least two of these. Most violated three.