Showing posts from November, 2019

Have the team tell you who is important - with an incentive

We wanted our 25 person team tell us who delivered value to them .  The one-time exercise was totally opened to being gamed and manipulated as are all systems.  We attempted to limit risk by keeping the stakes low.   The gift card experiment Proposal: We had a pile of $10 gift cards targeted to be used for incentives.  We gave everyone two cards.  They were to keep one as a reward and give away the other as a thank you to someone else for their help.   Process: Every person was given two gift cards. Each person kept one. This meant no one walked away empty handed. Each person had 7 days to give one away as a thank you for that person's help during the year. Supervisors and team leads were excluded. I did an informal survey to find out who people gave their thank you card to. No records were kept Results A couple people kept there give away cards.  This was disappointing but not a surprise. About 1/3 of the earmarked cards went to one of our team'

Quit worrying and love VMs and Containers

Did you ever wake up, look at your development box and wonder "when did that happen"?  I've started using Docker for deployed services like databases, messages brokers etc.  At the same time, I've been trying to use Kali Linux for hackathons and general security work.   Windows gaming, my pathetic mobile efforts and windows docker development are done using Windows 10.  Windows must run under a Hypervisor or as a dual boot.  That is how you end up with three hypervisors, three operating systems and two docker environments on the same machine. The following diagram shows the underlying complexity of all this. Hypervisors in action HyperKit on OS/X:  Docker for Mac desktop runs docker containers inside a HyperKit virtual machine that leverages the Mac OS/X Hypervisor.framework.  Docker named drives live inside this virtual machine. VMWare Fusion on OS/X:  VMWare Fusion can host Windows and Linux virtual machines. Fusion supports nested hypervisors whic

Data Lake - getting data into the zone

Data lakes exist to store and expose data in its native format without size or format constraints. Cloud data storage makes it possible to store large amounts of data without worrying about costs or data loss.  Corporate lakes often store the same data multiple in transformed or enriched formats making them easier to use.   My last two employers each had over 20 Petabytes of data in their lakes. A well-managed lake organizes data based on usage, data quality, data trust levels, governance policies, data sensitivity and information lifecycles. Lake architects can spread their data across horizontal zones for purpose and/or vertical organization zones .   The actual zones for purpose vary by industry or company. Zone Based Data Organization This diagram demonstrates a zone structure that might be fit for a financial services company.  It assumes that company generates its' own data and receives data from external organizations.  Data exists in unstructured, semi-structu

Data Lakes are not just for squares

Columnar-only lakes are just another warehouse Data Lakes are intended to be a source of truth for their varied data. Some organizations restrict their lake to columnar data, violating one of the main precepts behind Data Lakes. They limit data lake to be used for large data set transformations or automated analytics. This limiting definition leaves those companies without anywhere to store a significant subset of their total data pool data. Data Lakes are not restricted Data lakes hold data in its' original data format to retain data fidelity. All data sets retain their original structure, data types and raw data format. Some enterprise data lakes make the data more usable  by storing the same data in multiple formats , the original format and a more queryable, accessible format.  This approach exactly preserves the original data while making more accessible. Examples of multiple-copy same-data storage include. CSV and other data that is also stored in directly

Machine Intelligence Feature Flow

What is a Feature? A feature is data that has been prepared to be used as input to a Machine mode.  The feature can be a data set or scalar value or an aggregation. It is created by transforming, categorizing or aggregating original source data.  Features can be created and used in almost any type of application, and can be calculated a priori or calculated as part of model execution. What is an Enterprise Feature? ML/AI model usage in regulated industries often includes proof of data lineage used in training the model and in feeding the model in production. The models themselves must often be registered as they are trained and retained for audit purposes.  The retained features and retained models can be used later for bias or fraud investigations as part of the normal regulated industry audit process. An enterprise feature is a feature that meets regulatory, legal and compliance requirements required in regulated industries.  Data, and transformation registrations and an