Facebook, Disney Plus, Alexa and more affected by AWS outage – here’s all we know!
UPDATE: 16:35 PM PST / 19:35 ET The official AWS dashboard has published the following statement: “With the network device issues resolved, we are now working towards recovery of any impaired services. We will provide additional updates for impaired services within the appropriate entry in the Service Health Dashboard.”
Large parts of the internet have suffered long-lasting issues after multiple outages took down large parts of Amazon Web Services (AWS) network.
Data from real-time outage monitoring service DownDetector saw the incident begin at roughly 12:00 ET/15:30 GMT, with thousands of users registering problems across Europe, Asia and the US throughout the day.
Along with Amazon.com, other major websites including Facebook and Disney Plus, and more appeared to be suffering issues, alongside Amazon services such as Alexa, Prime Video, Ring, and Chime.
How was the downtime detected? There are a number of online services that proactively track whether popular websites are up or down. They are a variant of website monitoring services, particularly useful for those into website builders or web hosting novices.
Amazon’s official status dashboard was updated throughout the day with messages confirming the outage, which centered on the AWS US East-1 region, hosted in Virginia, with some users in other regions not seeing any outages.
Among the services impacted were EC2, Connect, DynamoDB, Glue, Athena, Timestream, and Chime and other AWS Services in US-EAST-1, with increased API error rates seen across the baord.
The outages were centred on a number of core AWS services, including increased API error rates with Amazon DynamoDB and Amazon Elastic Compute Cloud, as well as Amazon Connect, which handles contact center calls.
AWS Management Console and AWS Support Center also saw “increased error messages” across all territories.
AWS Management Console acts as a central hub for customers to access their suite of AWS services, allowing them to manage the full gamut of cloud computing and cloud storage.
Amazon Web Services (AWS) has provided an explanation as to what caused the outage that downed parts of its own services, as well as the third-party websites and online platforms that utilize AWS. In a post on the AWS website, the company explains that an automated process caused the outage, which began around 10:30AM ET in the Northern Virginia (US-EAST-1) region.
“An automated activity to scale capacity of one of the AWS services hosted in the main AWS network triggered an unexpected behavior from a large number of clients inside the internal network,” Amazon’s report says. “This resulted in a large surge of connection activity that overwhelmed the networking devices between the internal network and the main AWS network, resulting in delays for communication between these networks.”
According to the report, this issue even impacted Amazon’s ability to see what exactly was going wrong with the system. It prevented the company’s operations team from using the real-time monitoring system and internal controls that they typically rely on, explaining why the outage took so long to fix. Amazon notes that service started didn’t start improving until 4:34PM ET, and the issue was fully resolved at 5:22PM ET.
Since Amazon’s Support Contact Center also runs on the AWS network, customers weren’t able to create support cases for seven hours during the outage. Amazon’s Service Health dashboard, which the platform uses to provide status updates, was also impacted, resulting in Amazon’s delayed acknowledgment of the issue. The company says that it’s working on a way to improve its response to outages, and plans on releasing a revamped version of the Service Health Dashboard that should help customers across receive timely updates if an outage occurs.
In addition to knocking out popular services, like Venmo, Tinder, Disney Plus, and even Roomba, the December 7th outage also put some Amazon deliveries on hold. Amazon experienced its last major outage around this time last year, causing a number of sites and apps to go down for hours.