Toby, I think you're close. I'm thinking an application is now load balanced across multiple machines, and the application isn't aware of that fact. Clustering is easy until you toss in code that wasn't ever made to cluster.
Key breakage in one client app was due to the app using php sessions which was adjusted by making a minor change to the load balancers. Didn't eliminate the problem but did make the problem much less visible.