Cloud and Software Architecture, Soft skills, IOT and embedded
Surfacing things we know nothing about - What we know vs What we need to know
Get link
Facebook
Twitter
Pinterest
Email
Other Apps
Every nontrivial problem space consists of a multitude of issues, constraints, data points, and information-driven decisions. We try and understand or manage these conditions and variables based on our risk assessments which are biased by how well they are understood. This effort is a continual iteration where we strive to collect and classify issues and variables as we run across them. Some of these problems are very complex or outside our sphere of experience which forces us to pull in other disciplines or people with other experiences that may be able to identify additional requirements and constraints, unknowns, and uncertainties.
This article is in a draft and is subject to revision.
I first heard the known-knownsknown-unknowns and unknown-unknowns when a technocrat gave a speech over a decade ago. I had no idea what they were talking about and later learned it was a decision-making and political modeling tool https://en.wikipedia.org/wiki/There_are_unknown_unknowns
Risk Analysis Table
knowns
unknowns
known
We know about and understand
We know about it but don't know the answers to
unknown
We don't know that we know something. Our organization knows but hasn't surfaced to support the process.
We don't know the constraint exists that that can have a major impact
What do we know and how do we know what we need to know.
A completely new problem space is filled with things that we cannot know because we have no context to have that knowledge. Most problems have some relationship or similarity to a previous problem which gives some set of constraints that we understand, how to solve, known knowns, and another set of problems we can identify but have no answer for, known unknowns. In many cases, the lack of analogs may blind us or previous experience may mislead us so that we don't know what we are missing, the unknown unknowns. There is a fourth quadrant of problems or issues that we know that we choose to ignore or don't recognize as in scope. These are sometimes called the unknown knowns
Most of my problem/program experience is with massive technical systems. One of the reasons large technical solutions can fail is because of the number of things that must be known to complete the system. You often hear "we don't know what we don't know" The people involved make certain kinds of assumptions without rigor in the identification and retention of issues and constraints as they are identified. We can't know everything up front but we do need some way of moving decisions and problems from the unknown collum to the known collum.
Known Knowns
Needs, information, or requirements that are required and that we already understand.
It doesn't mean that we understand them completely. It means that we have a list of decisions that we know how to make.
I attempted to show this in the diagram below with the range of various colors of the checkboxes below. Instead, it means we know the size of the problem space and has a good idea as to what is required to remove risk coming from this area.
This is the area that can drive us to complacency or overconfidence when we overestimate the size of this portion of our problem space.
Known Unknowns
Needs, information, or requirements that are required but not well understood and fall into the known unknowns box.
They may have analogs in previous problems that we can leverage or they may be items identified as part of ideation. They can be a big to-do list and are thought of as the major scope risk.
There is also a range of how well we understand an unknown. Do we fully understand the problem and have no idea of a solution? Do we have a half-defined problem and thus no answer because the problem isn't well-defined?
This is an area that we can focus on. The goal is to drive items from here into the known knowns. We also need to stay on the lookout for previously unidentified constraints and pull them into here before they can be sent to known knowns.
Unknown Unknowns
New problem spaces or problem spaces where we have lost expertise can be minefields of things we didn't even know we had to ask about.
This area represents massive unknown risks and expenses. It is hard to know how much effort must be expended identifying unknown unknowns. It is hard to feel comfortable declaring a mission accomplished when you don't actually know all the facets of the mission or the places it can come apart.
Teams can sometimes try and identify unknown unknowns by fanning out and interacting with a broad set of other projects, teams, or experts. They can be directed conversations or just Knowledge Transfer sessions. KT sessions seem to always generate a new list of questions about things that hadn't been identified as in scope.
Unknown Knowns
This quadrant was the least obvious to me until I started reading more. There seem to be two camps on this quadrant that are subtly different.
We often have a pool of knowledge that we don't know that we know. There can be information or constraints that are known in our organization that has not been exposed. Sometimes we don't know that they fit within this context and sometimes we don't want to admit the complexity.
There are other times when we may have intrinsic knowledge that drives the decisions where we have not articulated the reason for the action. We don't know why we took the action. Sometimes we have constraints or problems that we choose to de-prioritize forcing them into lower visibility.
I've seen this with a system migration where we knew that data migration would be hard but we didn't want to acknowledge it because it would force us to acknowledge the problem was even harder.
Things You Think You Know that You Do Not Know
Rumsfield described a whole other class of constraint or piece of information where our knowledge is wrong. We end up making erroneous decisions based on this information. We need to be aware of this and be open to re-planning / re-design when previously understood constraints are re-evaluated with different results.
I do a lot of my development and configuration via ssh into my Raspberry Pi Zero over the RNDIS connection. Some models of the Raspberry PIs can be configured with gadget drivers that let the Raspberry pi emulate different devices when plugged into computers via USB. My favorite gadget is the network profile that makes a Raspberry Pi look like an RNDIS-attached network device. All types of network services travel over an RNDIS device without knowing it is a USB hardware connection. A Raspberry Pi shows up as a Remote NDIS (RNDIS) device when you plug the Pi into a PC or Mac via a USB cable. The gadget in the Windows Device Manager picture shows this RNDIS Gadget connectivity between a Windows machine and a Raspberry Pi. The Problem Windows 11 and Windows 10 no longer auto-installs the RNDIS driver that makes magic happen. Windows recognizes that the Raspberry Pi is some type of generic USB COM device. Manually running W indows Update or Update Driver does not install the RNDI
The Windows Subsystem for Linux operates as a virtual machine that can dynamically grow the amount of RAM to a maximum set at startup time. Microsoft sets a default maximum RAM available to 50% of the physical memory and a swap-space that is 1/4 of the maximum WSL RAM. You can scale those numbers up or down to allocate more or less RAM to the Linux instance. The first drawing shows the default WSL memory and swap space sizing. The images below show a developer machine that is running a dev environment in WSL2 and Docker Desktop. Docker Desktop has two of its own WSL modules that need to be accounted for. You can see that the memory would actually be oversubscribed, 3 x 50% if every VM used its maximum memory. The actual amount of memory used is significantly smaller allowing every piece to fit. Click to Enlarge The second drawing shows the memory allocation on my 64GB laptop. WSL Linux defaults to a maximum RAM size of 5
The Apache Tika project provides a library capable of parsing and extracting data and meta data from over 1000 file types. Tika is available as a single jar file that can be included inside applications or as a deployable jar file that runs Tika as a standalone service. This blog describes deploying the Tika jar as an auto-scale service in Amazon AWS Elastic Beanstalk. I selected Elastic Beanstalk because it supports jar based deployments without any real Infrastructure configuration. Elastic Beanstalk auto-scale should take care of scaling up and down for for the number of requests you get. Tika parses documents and extracts their text completely in memory. Tika was deployed for this blog using EC2 t2.micro instances available in the AWS free tier. t2.micro VMs are 1GB which means that you are restricted in document complexity and size. You would size your instances appropriately for your largest documents. Preconditions An AWS account. AWS access id and secret key.
Comments
Post a Comment