Thursday, August 11, 2011

When the cloud goes down

Yesterday starting about 12:30pm, the main data center for the spam, email and web hosting service provider that we recommend to all of you experienced a catastrophic failure. The initial problem was a power failure. This normally isn’t a big deal for a data center, since they are connected to multiple independent sources of power. But then it got worse. The ATS (automatic transfer system) failed too. Then the generators failed. It was kind of like that blackout that the whole east half of the USA and Canada experienced a few years ago, where one failure caused the next one until everything went black.

After a couple of hours, they were able to begin to get emergency power going for some systems and the failover to the Los Angelos data center occurred for the spam service. Eventually everything failed over to LA emergency servers. Then upon restoration of the full power in Dallas, everything began to move back to Dallas from LA.

Although a failure is a terrible thing, I was very pleased that they kept us informed via twitter every step along the way. We got play by play updates throughout the day and night. It was actually the best informed that we’ve ever been kept informed by a third party vendor in crisis. They even held a debriefing conference last night and again this morning. This is critical when choosing a cloud partner, because without the status information we would not have been able to make decisions on whether or not to pull mail direct into local servers or not. As it happened, this wasn’t necessary because we knew that the failover systems were kicking in.

What you will probably see this morning are duplicate emails. As systems are moved from one place to another, they automatically roll back to a point in time where there is no doubt that every piece of mail got delivered. So even though your mail got delivered by the emergency system yesterday, it might also get delivered today by the main systems. It’s a safe guard. The servers should be complete with the back log and email delivery assurance this morning and the duplicates will stop coming.

We did appreciate everyone’s patience through this issue. 

0 Comments:

Post a Comment

<< Home