Posts

Showing posts from 2021

Specifying Azure Resource Manager parameters on the command line instead of in JSON files

Image
Most of the Azure Resource Management template examples demonstrate using two JSON files. One file is the template definition that accepts a set of named parameters. The other file contains the parameter values.  The two files are accepted by the template engine and combined to create the actual definition. The Azure CLI supports providing parameters in JSON files or as name/value pairs on the command line.  Parameters are always specified using  --parameters .  The command-line option supports two different syntaxes and can be invoked multiple times to provide property values from multiple JSON files and  as command-line arguments. The middle example represents invoking the CLI with the two filenames The right-hand example represents involving the CLI with a template and a list of command-line parameter values.   Refer to https://github.com/freemansoft/vnet-p2s-vpn-bastion-azure/blob/main/4-create-storage.sh

Create Innovation Zones Outside your Controlled Environments

Image
Organizations implement controls to reduce risk, improve reliability, protect data and meet compliance objectives.  Formalizing processes and implementing controls often reduces the agility of an organization and makes it harder to innovate or experiment.  There is a constant tension between protecting the enterprise and innovation.  Note that we are talking about innovation  and not malice where people cut corners to make dates or make their lives easier. We can reduce the level of control or create special places where people can experiment and innovate.  We need to do it in a way that work done there doesn't bleed into the controlled systems and data. I worked at a place where we wanted to try a cloud service based database. We had no schema and just a Proof of Concept idea of what we wanted. It took 6 weeks of paperwork and several iterations of possible schemas to get onboarded and get access to the database for the PoC.  We knew the approved schema was wrong because we intend

When will AWS, Azure or GCP support hard limit or prepaid accounts?

Image
Cloud platforms like Azure, and AWS, and GCP provide new-user free tier accounts that help people get started in the cloud. They have no low-cost or capped offerings after that short period. They all expect you to move to uncapped fully metered pay-as-you-go services after that. This makes it hard for data scientists, developers and architects is how to stay current and innovate without running the risk of financial catastrophe.  I don't want to go bankrupt while experimenting We're talking about relatively small non-production use cases.  The dollar limits on these could be capped to relatively small amounts. I recently accidentally provisioned a dedicated Azure Event Hubs cluster that burned through my fixed $150 credit in 1 day.  The account ran up a $400 bill before Azure caught up and shut down the subscription.  The hard cap meant I was dead to Azure for a month but not dead to my spouse for spending our mortgage payment. After the time-limited free tier The time-lim

Loading both Lake and Warehouse - Single Transform Path

Image
Data Organization, build-vs-buy, transform audit, and technology choices all depend on your organization's policies, business, and compliance requirements. We are going to look at some business requirements that might put us on a different path from the parallel load, warehouse first, and lake first patterns previously discussed. Video Discussion This pattern assumes that all the primary  raw  and conformed/curated  transformations happen in one data repository with one set of tools.  The raw and conformed/curated zones are then replicated into the other repository.  Your org would choose whether the lake or the warehouse was home for transformations for those zones. 

Why companies build their own cloud control planes

Image
Many companies end up creating their own cloud management  control planes on top of their cloud provider's management APIs.  These homegrown  management systems provide a central location for provisioning cloud services, for configuring SaaS offerings. They also interact with corporate compliance and control systems like artifact inventories and data catalogs. What drives companies to this effort and expense? Video Walkthrough Self Created Cloud Control Plane Companies create their own API and Web UI-driven control planes as proxies for their cloud provider, their SaaS provider, and any internal providers that must be communicated with as part of infrastructure provisioning and deployments.   Manages the company's cloud resources Categorize those resources for compliance and budget Coordinates unified deployments and configurations across other control planes

Demonstrating creating EventHubs and Identities using the Azure template engine

Image
Create resources in Azure using the Azure Portal Template UI. We created a resource group with a Namespace and individual EventHubs in another video. Then I exported the resource group contents to a JSON file. Here we load that JSON file into another Resource Group to recreate the EventHubs and their associated identities and security settings. The code for this  discussion came out of working on a different blog posting .    The exported template is mostly hardcoded. You will want to parameterize any names that might vary by environment by reuse of the template for other purposes.

Managed Identities and Shared Access Tokens for EventHubs in Azure

Image
Azure EventHubs can be secured via IAM Role Permissions and Resource Access Policies.  They each have their own advantages and disadvantages as discussed in a previous blog posting .  We can see how the various Authorization techniques come together in the Azure Portal .  GitHub Repository The Azure portal images in this blog were generated using the 8/2021 version of this Github repository: Azure EventHubs Example Example Security Posture Our sample uses different authorization bindings to suit different client types. It applies those bindings at different places in the resource hierarchy. Individual EventHubs and Namespacesuse Identity Access Management with a Managed Identity and  Standard Azure Roles for some use cases.  They use Shared Access Policies  and signed requests for other use cases. Permissions are applied at the Namespace  and individual EventHub levels.  Namespace

Cloud Native and other Identities in Azure

Image
Moving into the public cloud involves balancing known techniques against cloud-native approaches.  Identity management and Authentication/Authorization is one of the areas where you can use your legacy on-prem approach or a more cloud-native approach.   Managed Identities are a powerful cloud-native identity available in Azure.  They are integrated at the Azure resource level, providing identities without the hassle of secrets management or separate lifecycle processes. Managed Identities  can act as a logical replacement firewall in cloud-only resource-to-resource topologies. Identity Types We are going to talk about 3 styles to cloud identity.  Person Identity: A user account, traditionally secured with a username and password.  User accounts are common in many systems.  Applications will often use a fake person or service account  as their identity.  Service account credentials, passwords, must be protected and rotated. This type of identity can be passed from one system to anothe

Avro Field Order matters when evolving a schema

Image
JSON and AVRO are both great serialization models.  JSON is all text, human readable, and very verbose.  AVRO is an efficient binary format.  They can serialize the same data but they can also handle schema evolution or field changes differently JSON supports field order changes because all of its fields come with their own label  in every single message.  Avro messages do not always handle field order changes. Field Order Avro serializer/deserializers operate on fields in the order they are declared. Producers and Consumers must be on a compatible schema including the field order .  Do not change the order of AVRO fields. All Producers and Consumers are must be updated at the same time if you change the field order. The AVRO 1.8 documentation says  Records   A record is encoded by encoding the values of its fields in the order that they are declared. In other words, a record is encoded as just the concatenation of the encodings of its fields. Field values are encoded per their schema

Fine tuning Key and Secret access with Managed Identities - Azure

Image
We want to protect cloud assets by giving processes the "list privilege" possible. This can be done through a combination of identities and access policies . Cloud assets are protected by access policies that describe the operations available to roles and identities. The Access Policies bind identities to permissions.  Application and system processes present their identity as part of resource requests and the Access Policies decide if access is granted.  Organizations can avoid creating "powerful" identities by creating multiple fine-grained identities, similar to roles.  Processes are assigned the minimum combination of identities required to access only the resources they required.  The Processes present the right identity when making a resource request. The Access Policies allow access based on the presented identity.   The example is implemented in Microsoft Azure.  Amazon AWS has similar capabilities. Use Case Virtual machines can need access to overlapping se

LUKS encrypting ephemeral disks - Azure

Image
Cloud providers offer disk-optimized virtual machine types targeted at high-performance distributed document and columnar stores.  These VM types are targeted at systems like Cassandra, ElasticSearch, MongoDB. Databases can consume a disk directly without standard file systems.  Cloud providers leave it up to you to decide how the drives will be configured.  We need to encrypt these drives to protect data "at rest". Cloud providers leave it up to you to configure and encrypt the drives.   Virtual Machines and Local Storage Virtual Machines have several classes of data storage.   Network dedicated boot and mounted drives  Network shared similar to SMB or NFS. Local ephemeral SSDs primarily aimed at temporary directories or swap space. Local high-performance NVMe/SSD provided to the machines as raw devices. It is the latter type of drive that databases and caching servers can self-manage to get high performance. They will typically hold data so they need to be encrypted at res