The FCC’s Public Safety and Homeland Security Bureau issued a public notice today urging service providers to adopt best practices to avoid network outages. “Based on submissions to the Commission’s Network Outage Reporting System (NORS) and publicly available data, the Bureau has observed a number of major service outages caused by minor changes in network management systems,” the bureau said in the public notice, which was released in PS docket 17-68. “These so-called ‘sunny day’ outages do not result from a natural weather-related disaster or other unforeseeable catastrophe, and can result in ‘silent failures,’ which are outages that occur without providing explicit notification or alarm to the service provider. In 2014, the Bureau first highlighted the occurrence of major ‘sunny day’ outages affecting users in multiple states. These major outages continue to occur, some affecting users nationwide. Outages that impact 911 service are of particular concern, given the importance of ensuring continuity of 911 service.
“After an analysis of the facts and circumstances, Bureau staff have determined that service providers likely could have prevented most of these outages if they had implemented certain industry best practices,” the public noticed added. In particular, it cited seven best practices recommended by the Communications Security Reliability and Interoperability Council (CSRIC) II that it said “could help prevent sunny day outages and silent failures …”
They involve awareness training, required experience and training, access privileges, network change verification, network reconfiguration 911 assessment, diversity audits, and network monitoring. “In addition to considering CSRIC-recommended best practices, the Bureau also recommends that service providers consider implementing the following lessons learned derived from the Bureau’s fact-based analysis of several recent outages. The Bureau finds that taking these steps could help to prevent future outages or mitigate the impact of outages that do occur,” according to the public notice.
They involve access control, validation and authentication, software-based alarming, enhanced outage detection, and automatic re-routing. —Paul Kirby, firstname.lastname@example.org