The Iowa Caucus Application Malfunction
Bad things Happen To Good People and Companies
If you have participated in a software deployment, then you know how nail biting the production roll out can be. You have tested for everything you could think of. You have people on standby to ensure that the application works as expected to do your final review of the roll out. You think you have done a great job but then THE USER calls and emails start flowing in, and you panic. Bad things happen to great people and companies, sometimes even when we think we have done our best work.
Your first line of defense against the nail biting roll out is to follow the agile and other project roll out standards. Then do a pre-launch on a test market that best represents your reality. The problem is in this day of IoT, Azure development, the promises of hardware and software vendors, you are expected to do miracles in less time, with smaller budgets, and less testing.
I hold old world beliefs in this strategy. I like testing, the more the better, testing that is controlled well defined and then roll out in production, test again before your high transaction day takes place. Then make sure your team has created, what if scenarios, for backup strategies for every process down the chain.
Do we have enough hardware space, CPUs transaction pipelines, and can we add more if needed in moments? Can we add in more phone lines, (not only because of extra value traffic but in case people find your numbers and want to spoof you) and add more customer service/help desk people, and do we have technicians for each chain to fix anything that cannot failover? Do we have proper fail over procedures? Do we have backups for everything from hardware, to code, data, databases, servers, network support, application team support in case something happens? Do you really want to roll up a patch prior to a large event? If you have a very public launch all patches should be tested completely before rolling up. Lately, the security patches themselves have been causing havoc. Unfortunately, the climate in the world is aggressive and people want to help defeat you rather than cheer your success.
It seems from reading about this situation, that there may not have been enough phone lines and they were not able to add more, that transactions were blocked and there were not enough staff members, or auditing tools used to track the blocking, and there were no expectations for any failure.
In the end this happens to everyone in the IT world and it usually happens because we left out a step or didn’t have realistic timelines to test real life production scenarios before the BIG DAY/EVENT.
I hope they will be able resolve the problems and help create a voting process that the public can trust. I wish them good luck and all the best in getting this done for their next launch. But don’t point fingers in glass houses you also live in. Use this experience to enhance your own development strategies and to remind you how important it is to follow each step.