The FCC’s Public Safety and Homeland Security Bureau released a public notice today to detail “lessons learned from major network outages” and remind providers “to review industry best practices to ensure network reliability.”
“Based on its recent analysis of several major network outages that affected subscribers, including those calling 911 for emergency assistance, Bureau staff determined that the outages could likely have been prevented or mitigated if the provider had followed certain network reliability best practices,” the public notice said. “Therefore, the Bureau encourages communications service providers to implement the following industry best practices, as previously recommended by the Commission’s Communications Security, Reliability and Interoperability Council: 1. Minimize Impact of Maintenance Windows. Network operators and service providers should be aware of the dynamic nature of peak traffic periods and should consider scheduling potentially service-affecting procedures (e.g., maintenance, high-risk procedures, growth activities) to minimize the impact on end-user services. 2. Monitor 911 Network Components. Network operators, service providers, and public safety entities should actively monitor and manage the 911 network components using network management controls, where available, to quickly restore 911 service and provide priority repair during network failure events. When multiple interconnecting providers and vendors are involved, they will need to cooperate to provide end-to-end analysis of complex call-handling problems. 3. Ensure Real-World Testing Conditions. Service providers and network operators should consider validating upgrades, new procedures and commands in a lab or other test environment that simulates the target network and load prior to the first application in the field.”
The bureau also suggested that several other “practices could prevent or mitigate similar outages in the future: 1. Registration Traffic. Include registration traffic in the highest priority category of network traffic. Attach critical alarms to failures in the registration process. 2. Data Packet Monitoring. Monitor traffic to detect when data packets do not progress across a network element. 3. Redundancy Failover. Fail over to redundant equipment when the number of error messages within a pre-determined period of time exceeds a certain threshold, rather than continuing to try to use the equipment that is generating the error messages. 4. Redundancy During Maintenance. When performing maintenance activity on multiple pieces of equipment that have the same function for redundancy, perform maintenance on only one piece of equipment at a time. Once successful maintenance has been verified, maintenance activity can begin on the next piece of equipment.”- Paul Kirby, paul.kirby@wolterskluwer.com
Courtesy TRDaily