Baked-in Runtime Falure Identification and Remediation

Web and other API-based applications are made up of some set of services, data stores and compute dependencies or partner business systems. We need some way of continually monitoring the components in isolation and in situ so that we can have automation repair or heal certain types of problems. Different control plane components have different requirements for the components that they manage. This means the instrumentation may be subtly different, shallower, or more detailed in order for the different control planes to take action for their particular needs. 

Health checks are one technique for determining the current health of a component. They are in-service test endpoints or very specific external code that exercise some capability of a specific instance of a component service. 

Each control plane or remediation touchpoint needs to be examined to understand what their actual area of concern is. This is refined into a set of health checks. Then the problems that can be identified by those health checks are matched with the actions that can be taken from automated remediation to human notification.


Video

Presentation Materials

Speaker's notes to be written











Revision History

Created 2023 04

Comments

Popular posts from this blog

Understanding your WSL2 RAM and swap - Changing the default 50%-25%

Installing the RNDIS driver on Windows 11 to use USB Raspberry Pi as network attached

DNS for Azure Point to Site (P2S) VPN - getting the internal IPs