Baked-in Runtime Falure Identification and Remediation
Web and other API-based applications are made up of some set of services, data stores and compute dependencies or partner business systems. We need some way of continually monitoring the components in isolation and in situ so that we can have automation repair or heal certain types of problems. Different control plane components have different requirements for the components that they manage. This means the instrumentation may be subtly different, shallower, or more detailed in order for the different control planes to take action for their particular needs.
Health checks are one technique for determining the current health of a component. They are in-service test endpoints or very specific external code that exercise some capability of a specific instance of a component service.
Each control plane or remediation touchpoint needs to be examined to understand what their actual area of concern is. This is refined into a set of health checks. Then the problems that can be identified by those health checks are matched with the actions that can be taken from automated remediation to human notification.
Video
Presentation Materials
Speaker's notes to be written
Revision History
Created 2023 04
Comments
Post a Comment