From the Amazon “Post Mortum”
EBS took longer to restore because so many storage volumes had lost power, and the spare storage capacity wasn’t sufficient to handle all the re-mirroring requests. A new or existing workload will have its writes blocked as EBS attempts to replicate a data set. To many functioning workloads, it appeared their storage volumes were “stuck,” as their writes were blocked, the post mortem reported.
To get unstuck, AWS needed to add more spare capacity, which meant rousting truck drivers and loaders out of bed to bring in more servers from offsite storage. “We brought in additional labor to get more onsite capacity online and trucked in servers. … There were delays as this was nighttime in Dublin and the logistics of trucking required mobilizing transportation some distance from the datacenter. Once the additional capacity was available, we were able to recover the remaining volumes.”