Facebook and all its entities had a moment of angst this week. Their network went down when they updated configurations in their internal network, which led to Facebook’s network no longer broadcasting to the wider internet how to find their systems. In fact, their systems sent out updates to the rest of the internet telling them to forget how to find them. This was through BGP or Border Gateway Protocol, the protocol that binds together the networks that make up the internet. This was the result of a fault configuration.
This happened to us in a very small way about a year ago while trying to test our DR process. We were further disrupted when on the same day and in fact moment our DR test was going on Go Daddy had an issue, and we couldn’t change the DNS record to update the IP address because their systems were down for a short while. Outsiders or clients assume this was all a DNS attack. It was a bad DNS. The address was gone. As if you moved your mailbox and removed the number off your house. In our case, we noticed we missed a step then we noticed an outage for the vendor and panicked for a moment or two. Eventually we solved the problems completed the DR. Later we wrote in protocols to check the vendor’s alerts and to validate all of the IP’s before “moving,” our environment for DR.
The short moral of the story is you must create steps for every procedure, make certain more than one person is trained, review the documentation, test the documentation. Test the documentation by giving it to someone to follow who has never seen it before and hasn’t performed the procedures for the task. Then set up testing at least twice a year updating the documentation and adding on people to be trained in the skills needed. I am certain a few people are knocking themselves in the head but it the mistake is not only the network engineer’s fault. The fault also lies with the operations directors and IT managers to ensure everything is documented and you always follow the steps, and you constantly train your train your staff.