Go Daddy outage highlights need for network segmentation, visibility

Saturday, September 22, 2012

Go Daddy outage highlights need for network segmentation, visibility

What kind of lessons can enterprises draw from the recent Go Daddy outage?

While outside influences -- like a distributed denial-of-service (DDoS) attack -- can trigger a network outage, many outages are the result of avoidable network events brought on by system updates or configuration errors.

Domain name hosting provider GoDaddy.com recently suffered intermittent network outages that brought down customers' email and websites for six hours. The Go Daddy outage was originally thought to be the work of a DDoS attack, but later was determined to be the result of corrupted router data tables stemming from a series of internal network events. In a message posted on Go Daddy's website, CEO Scott Wagner said the provider wasn't hacked and no customer data was compromised.

Go Daddy has since implemented measures to prevent this type of network outage from occurring again, Wagner wrote.

Just because network outages are a common occurrence doesn't mean providers and enterprises should go down without a fight and accept an unreliable network. Better visibility, failover plans and network segmentation can help limit the impact of a network outage for an enterprise.

GoDaddy outage points out need for network monitoring and visibility

Six hours of service interruption is a long time for a popular provider like Go Daddy. Better network visibility could ensure earlier identification of such problems and faster resolution for a provider like Go Daddy and for an internal enterprise network, noted Tim Nichols, vice president of global marketing at New Zealand-based Endace, a network traffic recording and visibility provider.

Because so many businesses and customers rely on Go Daddy for connectivity, "[the provider] must invest real money in network visibility and network history tools in order to minimize the time it takes engineers to respond, establish root cause, and repair serious service-affecting problems," he said. That visibility must extend all the way into the changes network administrators make to a network. Because the source of the Go Daddy outage most likely stemmed from a corrupted router updating other router tables incorrectly, better configuration and patch management may have avoided the outage, said John Pironti, president of consultancy IP Architects LLC.

But given the reputation of Go Daddy, it is unlikely that one issue -- such as human error resulting in a corrupted update -- triggered the outage, and one solution may not have prevented it.

"This was most likely the result of a complicated set of updates or [a] number of controls failing," Pironti noted.

Early detection of networking issues is important, especially following any network updates. Enterprises and providers should closely monitor traffic flows and data to spot corruption as soon as possible, he said.

Network segmentation, design considerations slashes risk of network outages

Careful network design techniques can go a long way in preventing a failure similar to the Go Daddy outage. Network segmentation can lessen the impact of a failed network device, Pironti said.

"Outages are a reality, but both enterprises and providers should segment their infrastructure so that no single set of equipment or solutions can impact an entire environment if availably and uptime is the primary business need," Pironti said, noting that greater resiliency should be built into network infrastructure.

One method of network segmentation is running multiple, parallel systems -- a network design technique that allows organizations to make necessary updates one at a time in order to limit the risk of impact to the users.

"Having failover between the systems in case one should start behaving strangely is important because it allows traffic to be quickly moved to another system that has not been impacted," said Craig Mathias, principal with Ashland, Mass.-based Farpoint Group advisory firm.

A highly distributed network architecture is much more likely to survive and bounce back quickly from any type of network failure compared to a monolithic system, Mathias noted.

While running parallel systems is an option for an enterprise, it may not be an appropriate solution for a provider like Go Daddy, Pironti said.

While enterprises can benefit from segmenting two physical facilities -- what is done in most cases of network segmentation -- it's not clear that parallel systems would have solved the Go Daddy outage because the provider has many facilities, he said.

Final Go Daddy outage lesson: Don't single-source providers

While providers can adopt technologies and procedures to minimize failure impact to their customers, enterprises do not need to sit idly by while the service provider works out networking issues.

"[Enterprises] must understand that they are dealing with complex, imperfect systems that will fail and they do need a contingency plan so they don't become a victim," Mathias said.

If availability is a critical factor, enterprises should diversify with multiple providers to ensure a secondary environment in the event of a network outage of one provider, noted IP Architects' Pironti.

"The user has to take some responsibility," he added, noting that no provider is too big to fail.

Failing over to another provider is one way that enterprises can ensure that business will stay up and running, even though they may not have 100% functionality, noted Farpoint's Mathias.

An enterprise considering a multi-provider environment can also benefit from using vendors -- like Akamai Technologies -- that offer load balancing and redundancy between providers, Mathias added.

"Enterprises need scalable expansion that allows them to grow, without any single point of failure," he said.