Tuesday, August 2, 2011

Has Cisco CRS-3 caused network outages at both Comcast and AT&T?

By Brad Reese

The Cisco CRS-3 has caused network outages at both Comcast and AT&T while Cisco continues to deny any problems with the CRS-3, blaming carrier configs for the issue.

Meanwhile, Comcast and AT&T have formally petitioned both Cisco and Broadcom on the issue...
And so what's exactly the issue?

Well, its called bit flipping.

The bit flip is an interesting phenomenon because the Cisco CRS-3 is rumored to have used Broadcom 3rd party silicon (view Cisco Systems' Fear of a Broadcom Planet), which did not use ECC protected memory subsystems and did not use low-alpha particle lead. Normal lead occasionally kicks out an alpha particle and with transistor densities being what they are today it can cause a bit to "stick" in a memory subsystem. This then causes the memory to corrupt and then pretty much snowballs into at least a minimum of having the ASIC being reset, and then possibly having an entire system reset.
This happened most famously with the Cisco 4500 and 6500 and the Toshiba SRAMs back in the 2002-2004 time frame, causing tens of millions of dollars of hardware recall.

Bottom line: There's no real "field fix" in most cases and a line card swap out is called for.

I find this interesting because Cisco CEO John Chambers stated during Cisco's Q1'FY11 earnings conference call:

"Just as an update, the customer acceptance from the pilot perspective on the CRS-3 is off to a great start. In Q1, we shipped our first CRS-3 system. We received $51 million in orders from 30 customers. However, we expect there will probably be several more quarter before we see rapid increase in these accounts from dual high-end routers as they test out all the new systems of this magnitude before they begin volume commitments."

Finally in my opinion, the above Cisco CRS-3 bit flipping issue should be brought to the attention of Cisco's customers.

No comments:

Post a Comment