Failure Modes Effects Analysis - FMEA - Step One - throwing down failures.
We can make it better if we measure or analyze it. Let's analyze a small program in order to determine how it might fail and what we can do about it. We will break down a software program into smaller modules and look at how each phase or component might fail. We will also look for silent failures or a lack of success metrics where something didn't occur at a time when there should have been some activity. Sample System Under Analysis Our example system is a data lake sink that Reads streaming data Validates the data Bundles the data into micro-batch sets Writes the data to a data lake. Each lake write has a corresponding metrics push that updates our metrics store statistics and other features. Video Walkthrough In this video, we throw down as many failures as we can think of. We can worry about detection and remediation in a later phase Worksheet Template We will record the identified failure modes using a worksheet lik...