Posts

Cloud Strategy: All In or Total Portability

Image
Organizations have two primary strategies when they move to the cloud. Prioritize time to market. Use the cloud provider's services as they were built without customization. Prioritize portability and capability. Focus on avoiding vendor locking by buying the best cloud-agnostic or building their own t meet their custom needs Every technical or platform decision needs to include your cloud strategy as one of its primary drivers.  Decisions that deviate from the standard should be considered technical debt to be revisited later. The cloud strategy is like any other PDCA . Select a strategy. Document the drivers for the decision.   Make the approach clear to the company.  Revisit  the decision on a regular basis to align with business needs The benefits of both approaches

Someone wants your software - Is it platform ready?

Image
You created some piece of software that can be repurposed by you or by others.  Step back and think about how the system was built.  Do the design and data protection rules mean you have to run multiple single-tenant instances?  Is it built in a way you can securely add tenants into a multi-tenant system? Identity management, data security, load isolation, data isolation, log, and metric isolation, reporting controls, data exposure, and APIs are just some of the things that you need to review before signing up new consumers. Multi-tenant or Single-tenant The big push is to Software as a Service.  You stand up your platform in the cloud for use by other teams or organizations.  There are two main models for supporting multiple customers.  Muli-tenant: The customers all run within a shared environment.  The environment is coded to firewall off the different consumer groups to make it appear as if they are the only ones in the system. Multi-tenancy often must be suppor

Isolating Historical Data and Breaking Changes

Image
Teams often run into situations where they have a data set that broke its compatibility at some period in time.  This often happens when you have historical data that came from a previous system.  We want the ability to combine that data in a way that consumers have to understand as little of that difference as possible.   The differences between historical and active data are essentially a major version, breaking change to the data. The two major versions  of the data can be isolated in their own raw storage area and then merged together in one of our consumer-driven  zones.  We can continue to support minor version producer schema changes as they occur in one of the raw streams.  Those changes would then be handled in the transformation tier into the conformed zone. We register and link the three data sets in our Data Governance Catalog. This lets us capture the data models while enforcing data change and compatibility rules. Disciplined organizations will also register the two trans

Capturing SDLC Swim Lane Identities and Roles

Image
Identity and Permission inventories first step towards understanding your identity and permission exposure. We want to create a common understanding of the identities and roles used by our systems. Actors that reach out to other capabilities operate with an identity . Capabilities that are asked to do something on behalf of actors are configured to allow or disallow work requests based on the role that the Actor's identity has in the receiving system. Individual components may be operate as both Actors and Capabilities at different parts of their processing. The principal of least privilege says that tasks execute with the minimum permission to do the work request. The simplest way to do this to isolate each actor by giving them their own identities. Each system contacted by the actors maintains an identity/role map that describes the identity's permissions in the receiving system. The table at the right shows The iden

Capturing SDLC Swim Lane Configurations

Image
Software systems have gotten more complicated with the introduction of microservices, SaaS/PaaS services, serverless compute, and other patterns. Those components must be managed, created, and integrated across multiple environments that make up the Software Development Life Cycle. This talk describes a method for documenting how the software is configured and how it can be reached across various environments. This is useful for: A common understanding of the environments Communication between teams. YouTube Video See how we can create an environments matrix  based on the simple web application shown below. The table in the video is flipped on its axis from the table below. It shows the properties across the top and the environments down the page.  The table below and above shows the environments across the top and the configuration information down the age.  The former works better when you have  a lot of environments.  The table on this page works better if you have  fe

Protecting data at rest in SaaS and PaaS. Encryption Basics

Image
PaaS and SaaS persistence services store your data in their systems, often in their accounts or subscriptions.  The service provider protects the system and its associated storage. We need to determine our appetite for risk when deciding what additional work must be done to secure the externally hosted data. Risks Data at rest must be protected with a multi-layered approach. Identify the attacks that you wish to prevent in order to determine how much protection you want. The list below is just a sample. disk re-use  is hardware or technology-related. It can be mitigated without any application or user experience changes.  Vendor -related access issues exist because a 3rd party is hosting the data. This includes vendor staff access and the ability to remove or render unusable your data in their ho Control Plane refers to dashboards or admin screens that are vendor-provided.  Many have preview functions that let you validate the data.  The built-in user permissions may not be fine-grain

Specifying Azure Resource Manager parameters on the command line instead of in JSON files

Image
Most of the Azure Resource Management template examples demonstrate using two JSON files. One file is the template definition that accepts a set of named parameters. The other file contains the parameter values.  The two files are accepted by the template engine and combined to create the actual definition. The Azure CLI supports providing parameters in JSON files or as name/value pairs on the command line.  Parameters are always specified using  --parameters .  The command-line option supports two different syntaxes and can be invoked multiple times to provide property values from multiple JSON files and  as command-line arguments. The middle example represents invoking the CLI with the two filenames The right-hand example represents involving the CLI with a template and a list of command-line parameter values.   Refer to https://github.com/freemansoft/vnet-p2s-vpn-bastion-azure/blob/main/4-create-storage.sh

Create Innovation Zones Outside your Controlled Environments

Image
Organizations implement controls to reduce risk, improve reliability, protect data and meet compliance objectives.  Formalizing processes and implementing controls often reduces the agility of an organization and makes it harder to innovate or experiment.  There is a constant tension between protecting the enterprise and innovation.  Note that we are talking about innovation  and not malice where people cut corners to make dates or make their lives easier. We can reduce the level of control or create special places where people can experiment and innovate.  We need to do it in a way that work done there doesn't bleed into the controlled systems and data. I worked at a place where we wanted to try a cloud service based database. We had no schema and just a Proof of Concept idea of what we wanted. It took 6 weeks of paperwork and several iterations of possible schemas to get onboarded and get access to the database for the PoC.  We knew the approved schema was wrong because we intend

When will AWS, Azure or GCP support hard limit or prepaid accounts?

Image
Cloud platforms like Azure, and AWS, and GCP provide new-user free tier accounts that help people get started in the cloud. They have no low-cost or capped offerings after that short period. They all expect you to move to uncapped fully metered pay-as-you-go services after that. This makes it hard for data scientists, developers and architects is how to stay current and innovate without running the risk of financial catastrophe.  I don't want to go bankrupt while experimenting We're talking about relatively small non-production use cases.  The dollar limits on these could be capped to relatively small amounts. I recently accidentally provisioned a dedicated Azure Event Hubs cluster that burned through my fixed $150 credit in 1 day.  The account ran up a $400 bill before Azure caught up and shut down the subscription.  The hard cap meant I was dead to Azure for a month but not dead to my spouse for spending our mortgage payment. After the time-limited free tier The time-lim

Loading both Lake and Warehouse - Single Transform Path

Image
Data Organization, build-vs-buy, transform audit, and technology choices all depend on your organization's policies, business, and compliance requirements. We are going to look at some business requirements that might put us on a different path from the parallel load, warehouse first, and lake first patterns previously discussed. Video Discussion This pattern assumes that all the primary  raw  and conformed/curated  transformations happen in one data repository with one set of tools.  The raw and conformed/curated zones are then replicated into the other repository.  Your org would choose whether the lake or the warehouse was home for transformations for those zones.