Wednesday, December 21, 2016

Protecting data in-transit. Encryption Basics

Web traffic is protected in-flight when it is transferred via TLS encrypted links using the HTTPS protocol. HTTPS is a protocol for payload encryption that is based on algorithms using encryption asymmetrical  keys.  Asymmetrical keys are managed, packaged and distributed with via certificates



Encryption Basics

Asymmetrical encryption relies on a key pair where one key can decrypt any data that is encrypted by the other.  Data encrypted with Key-A can be decrypted with Key-B only.  Key-A cannot be used to decrypt data encrypted with Key-A.  Key-B cannot be derived by knowing Key-A.

Internet encryption relies on asymmetry and key anonymity in order to create secure links over a public and untrusted Internet.  A server or party can publish a public key that other parties can use to encrypt their data.  The server then can decrypt the message using the corresponding private key. Encrypted messages are secure as long as the server keeps the private key secret. Private keys must be protected

Note:  Digital signing is an inverse.  An entity signs something by encrypting it using its private key. A third party can then decrypt the signed item by using the originating entity's public key.  The originator protects the private key insuring that no one else can generate a message decryptable via the entity's public key.

Web Server Encryption Basics

Browsers and Web Servers communicate securely by using asymmetrical public/private key encryption.  Encryption keys are distributed in envelopes called Certificates.

  1. A client connects to a server and requests the server's public certificate.
  2. The server then returns the certificate.
  3. The browser verifies the certificate expiration date and that the name of the certificate matches the server name.
  4. The browser verifies that the certificate is still valid optionally checking repudiation.
  5. The browser encrypts its message payload and a key/seed using the public key.
  6. The browser sends the encrypted payload which is then decrypted by the server.
  7. The server replies to the browser using the public key that was contained in the original encrypted request.

Encryption Relies on Certificates

Encryption keys are bound to a sever and vouched for through digitally signed certificates. A Certificate Authority (CA) issues certificates that contain encryption keys. They digitally sign the certificate using their own CA keys.  The signed item includes encryption keys and other information describing the identity of the CA holder and the CA issuer.  

Normal Web Server certificates are essentially static, bound to a host name. Web server certificates include the value of the host name of the server they should be used on. This means a certificate can only be used by single host because the calling party can match the certificate encrypted host name value against that in the request.  

It is possible to add multiple alternate host names in the Subject Alternate Name fields. It is also possible to order a wildcard based certificate that specifies the associated hosts via wildcard and domain name combinations.  Many organizations frown on wildcard certificates because they can be used on any matching hosts, possibly even unapproved ones.

When and where

A lot of companies use TLS encrypted links at the edge of their organization talking to other organizations or the internet. Edge nodes and relatively rare making certificate issuance and maintenance relatively straightforward. They often assume that the interior of their network is secure and do not encrypt in-network traffic.

Companies often change strategy when moving to the cloud. They then want to encrypt all traffic based on the idea that they encrypt traffic on networks that they don't own. They don't own the underlying cloud provider network so all traffic must be encrypted.

Cloud Complications

Full network encryption means that load balanced applications require certificates in multiple tiers. This means there can be significantly more hosts and corresponding certificates. 

Cloud environments are very dynamic.  Application deployments often build new machines for each deployment. Cloud infrastructure supports dynamic addition and removal of application hosts based on demand.  Cloud infrastructure also has its own naming convention for new hosts.  All of this combines to create a bit of a certificate management headache.  Teams may have to issue new certificates for every new web endpoint whenever they are initially provisioned or re-provisioned over the top of old instances.

Cloud workarounds

Cloud environments can optimize this problem by making a couple assumptions. I'm using AWS ELBs as an example here.  

They can assume that applications are always accessed through their load balances. This can be enforced through cloud and network security.  In the end, only the Load Balancer requires a publicly trusted certificate for its link encryption. TLS connections for the application servers behind the load balancer can be signed using any type of self-signed or transient CA.  Local certificates can be regenerated every time a new machine is built and destroyed with the machine when it is de-provisioned.

Related Topics

  1. Protecting Hybrid Environments: Blog not yet written - Video
  2. Enrypton Basics: Blog - Video
  3. Trust Chains: Blog Video

Friday, October 28, 2016

Deploying DotNet Core in Azure with GIT and Kudu

I starting this project trying to build and deploy the ASP.NET Core example application first on my local box, then in Microsoft Azure via Web Deploy, Microsoft Azure via local Azure GIT integration and finally via Visual Studio Team Services (VSTS) via SCM integration.

Deployment Types

  1. Local deployment into a local IIS is pretty straightforward. We won't talk about it here. 
  2. Remote web deployments are the legacy way of pushing Web applications to the (Azure) cloud that works with IDE, CI or command line. Compiled and static application artifacts that are then sent to the remote application servers via FTP.   The servers unpack the archive and deploys it.
  3. Remote SCM deployments are a relatively new and interesting way to support automated deployments and to support multiple branches with little work. The IDE or build system pushes source code to a monitored repository.  Azure (Kudu) monitors the source code repository, runs a build and deploys the resulting artifacts to the cloud.

There are a lot of many variations in deployment models.  These are some of the most common. The diagram may be getting carried away when creating visuals.

Any build server would work in place of Visual Studio Team Services:
Jenkins, Team City, TFS, etc.


Every Azure web site has an associated, Kudu, SCM service. The SCM can bind to a variety of repositories. Kudu has a micro build server and source code repository. It can build a variety of platforms. This makes it possible to create a CI/CD pipeline with no external components.  



A single application can be deployed multiple ways. A team might support web deployments for individual developers, build server web deploy when running tests and repository (Kudu) deployments for Continuous Delivery environments  This is often a philosophical decision build-once vs build-every.  Note: Kudu deployments look like "rebuild every" as of 10/2016.

Repository based deployment with Kudu in Azure

Some folks may be surprised to know that Microsoft has added an integrated Source Code Management system and build server with every Azure Web App.   This means you can build and deploy code in Azure from inside Azure without standing up additional services.  

Kudu supports ASP.NET, nodes.js, python and basic web sites.   It also has Deployment , Web Integration and Post Deployment hooks that let you customize or supplement the Continuous Integration environment.  Deployments are done in a windows environment.

I tested this with the example ASP.NET core application.
  1. Create an Azure web site.
  2. Create a local GIT repository.
  3. Create an application. 
  4. Commit local changes to local GIT
  5. Push changes to GIT tied to your web site.
  6. Test the web site after Kudu builds and deploys your application

This next diagram shows a couple alternative ways of integrating an external SCM with Kudu and Azure SCM services.  The same model works with any build system or external repository.  I used Visual Studio Team Services.  You could use others like Team City or Jenkins. External lets you implement more sophisticated processes based on your existing system. 

I tested this with the following flow:
  1. Create an Azure web site.
  2. Create a local GIT repository
  3. Create an application
  4. Commit local changes to GIT
  5. Push changes to the develop branch in VSTS.
  6. Run a ASP.NET core build in VSTS.
  7. Have the build merge the changes to a GIT branch in VSTS.
  8. Push the merged branch changes to GIT in Azure
  9. Test the web site after Kudu builds and deploys the application.

There are plenty of internet sites that show how to deploy an application in Azure based on this SCM integration.

Integration is configured in the Azure portal under App Deployment --> Deployment Options.  The image on the right shows an application configured to deploy an application that is stored in a GIT repository managed by Visual Studio Team Services (VS Online)

Web Deploy via IDE and build servers

Azure still supports standard Web App (FTP?) deployments from developer workstations and from build servers.  Azure doesn't care where the deployments come from.

Here we have a simple web application that is deployed to a local IIS server and to an Azure web application. The developer/deployer can use Visual Studio's "run application" for IIS and "Publish" to deploy to Azure.

Here we have a simple web application that is deployed to Azure.  The build server, VSTS in this case, runs a deployment phase that pushes the application to Azure. 


References

Create 2016.10.28

Monday, October 17, 2016

Visual Studio Team Services Git your build face on

This page describes configuration settings required to enable GIT integration when building code in Visual Studio team Services.  It will show you how to
  • Enable CI builds when a specific GIT repository and branch are updated
  • Provide the CI build with permissions required to make changes to a GIT repository
  • Provide the CI build with credentials required to make changes to a GIT repository
This diagram shows how GIT and Visual Studio Team Services (VSTS) might be used to implement a CI build triggered on check-in that merges code into another branch and deploys it.  The actual deployment commands are out of scope for this document.



The following changes must be made on the Repositories configuration at the Project (Team) level and on the affected individual build definitions. We first show project level configuration and then Build Definition configuration.

Let VSTS Builds Update GIT Repository

Some builds may need to update a GIT repository upon build success.  This could be to merge with another branch, add test results or some other process. This change occurs at the VS Team Project level.
The build account must have GIT update permissions.This is a repository level control that must be set in the VS TS Control panel. 

This can be applied at the repository or branch level.  The picture to the right shows where to click to set this at the repository level.

All branches inherit permissions set at this level




Access control attributes "Branch creation" and "Contribute" have default values of "Not Set".  Change these to "Allow".

Access control attributes "Read" and "Tag Creation" have default values of "Inherited allow".  That level is sufficient for our needs.  Do not change them.




Enabling GIT Build Integration

Builds are bound to code repositories.  Builds must be configured to point to the correct Git repository and branch. This change is made on each build that requires Git access. This must be done for every build that is triggered by check-in changes.

Configure the Repository on the Build -> Definitions tab tied to he build itself.  This is not done on the project Control Panel.

Triggering Builds on Changes

Continuous Integration builds trigger whenever the source code repository is updated.  This must be done for every build that is triggered by check-in changes.

Configure build triggers on the Triggers pane of the Build -> Definitions tab tied to the build itself.  This is not done on the project Control Panel.






Provide Credentials for GIT Updates


Some builds may need to update a GIT repository upon build success.  This could be to merge with another branch, add test results or some other process.

The build account must have GIT update permissions as described above.

The build itself must present the build account's OAuth token to GIT when running any GIT update commands, usually "GIT push"

Enable OAuth tokens in scripts on the Definitions -> Option pane.

Troubleshooting

This error means you have not enabled credentials propagation.  You must enable the OATH Token.
##[error]bash: /dev/tty: No such device or address

References

The following web pages or blog articles were helpful while learning this information
  • https://www.visualstudio.com/en-us/docs/build/scripts/git-commands#enable
  • http://stackoverflow.com/questions/38670306/executing-git-commands-inside-a-build-job-in-visual-studio-team-services-was-vs
  • https://www.visualstudio.com/en-us/docs/setup-admin/permissions

Created 2016 Oct 16
Updated 2016 Oct 26 Added doc pipeline

Tuesday, October 4, 2016

Classifying your return codes

Document the meaning, ownership and handling behavior of your Service return codes.  Do not assume your partner teams and calling systems have any expectations or understanding beyond success and not-success. Ask other teams, you call, for their Service return code documentation. Force them to document their expectations.

Proposed Return Code Category Types

Create response categories.  Determine the owner and expected behavior (possibilities) for each category for services you build.  The following is a simple proposed of categories


HTTP Code CategoryRemediation OwnerRemediation
SuccessEveryoneApplication or n/a
Business ErrorBusinessManual process or Application rule
Technical ErrorIT / TechnologyManual
Embedded RedirectIT / TechnologyApplication Library
NACK / RetryIT / TechnologyLibrary and/or delayed retry mechanism

Asynchronous Messaging.

You can create the same types of categories for message driven systems.  They can post return codes to reply / response queus or to event log capture.

Proposed Return Code Categories

Map each response code to one of the categories.  Make your application return the correct code for each service it provides. Implement the correct behavior for each service it invokes. The following is a sample list of proposed HTTP return code categories.

HTTP CodeReturn Code DescriptionDefault Contract Behavior
100ContinueTechnical Error
101Switching protocolTechnical Error
200OKSuccess
201CreatedSuccess
202AcceptedSuccess
203Non-Authorative InformationSuccess
204No ContentSuccess
205ResetSuccess
206Partial ContentSuccess
207Partial SuccessBusiness Error
300Multiple ChoicesTechnical Error
301Moved PermanentlyEmbedded Redirect
302FoundEmbedded Redirect
303See OtherEmbedded Redirect
304Not ModifiedTechnical Error
305Use ProxyTechnical Error
306UnusedTechnical Error
307Temporary RedirectEmbedded Redirect
308Permanent RedirectEmbedded Redirect
400Bad RequestBusiness Error
401UnauthorizedTechnical Error
402Payment RequiredTechnical Error
403ForbiddenTechnical Error
404Not FoundTechnical Error
405Method Not AllowedTechnical Error
406Not AcceptableTechnical Error
407Proxy Auth RequiredTechnical Error
408Request TimeoutTechnical Error
409ConflictTechnical Error
410GoneTechnical Error
411Length RequiredTechnical Error
412Precondition FailedTechnical Error
413Request Entity Too LargeTechnical Error
414Request-URI Too LongTechnical Error
415Unsupported Media TypeTechnical Error
416Request Ranage Not SatisfyableTechnical Error
417Expectation FailedTechnical Error
500Internal Server ErrorNack / Retry
501Not ImplementedTechnical Error
502Bad GatewayTechnical Error
503Service UnavailableNack / Retry
504Gateway TimeoutNack / Retry
505HTTP Version Not SupportedTechnical Error
N/AFailed to Route?

Final

blah blah and blah blah

Created 2016 Oct 04

Monday, October 3, 2016

Success and Failure in the world of Service Calls and Messages


Victory has 100 fathers. No-one wants to recognize failure.    The Italian Job
Developers and designers spend a fair amount of time creating API and Service contracts.  The primary focus tends to be around happy path API invocation with a lot of discussion about the parameters and the data that is returned.  Teams seem to spend little time on the multitude of failure paths often avoiding any type of failure analysis and planning.

At the API level: Some groups standardize on void methods that throw Exceptions to declare errors. The list of possible exceptions and their causes is not documented.  Exceptions contain messages but no structured machine readable data.

At the REST web service level: Some groups standardize on service return values other than 200 for errors. They do not document all of the HTTP return codes they can generate or their meanings.  API consumers are left to their own devices to figure out who owns failures and how they should handled.

Note:  I get a special kind of heartburn when discussing APIs folks who have no concept of partial success or success with soft errors.  Yeah, CRUD means you commit or you don't.  More complex applications have more subtle nuances.

Infrastructure Integration

We can do this on a per service basis or try and come up with some type of standard.  This is generally a good idea and makes it possible to plug behavior into infrastructure components without them having to know any details of the invocation or business process.
  • Can you build a system that only looks at the envelope (headers, return codes)?
  • Does it work for impartial systems like service buses or API routers?
  • Can we create a standard that lets us get statistics from cloud components like Load Balancers

Conceptual Baseline

Let's take broader view of service invocation and define possible end states for API invocation. APIs can end in success, partial-success fatal failure, recoverable failure and possible other states. Failure can be due to technical issues, bad code, defects, business rules or broken business processes. Failures and errors must be owned by someone either from the business or by some technology team.  

Lets use the following diagram as a starting point. We divide failure based on the owner of the triage and remediation.


Success

CompletePartial / Soft Errors
DB inserts should probably return a transaction receipt or the key to the updated data or the URL to retrieve the modified data. Success may return confirmation codes, correlation IDs, operation codes, text messages or message parameters. Some of these are used for audit and some are used to build a better user experience. This one seems to give some folks stomach acid problems.  They view success as something absolute. That is true for DB operations This may not be true for any type of call that can have soft errors where the back-end service can be partially successful or blocked in some way that doesn't immediately impact the caller.  We'll talk about the meaning of "success" later.


Business Failure

Business users own the rules, triage and repair of business failures.  Some business failures are common and may be handled through normal application behavior or some type of rework/manual business processes., In other cases they may they may rely on the technical team to acquired failure details and to apply a fix.  The best case is that business users can fix, restart AND terminate the business processes themselves.

Business rules are a normal part of computer programs.  Business rules may terminate an operational request in response to a business rule failure.  This is not a Business Failure for the purposes of this discussion if it is an expected type of behavior.

Business processes can terminate because of a business rule that may actually pass at some time in the future.  This could be due to a asynchronous workflow some type of failed requests or behavior related to soft edits or something else.Business processes can terminate because of unexpected data or state.  They may fail deliberately due to policy rules. 

Retry ReadyBusiness Triage
Retry Ready. Business Failures and Technical Failures may be indistinguishable from each other. Some business failures can be immediately retried or retired some time later. There are a whole range of reasons a business transaction can fail. Some business processes  fail, routing work to application error handlers or some type of manual rework queue. There may or may not be tools to fix data found in the Business triage when the business failure was unanticipated..


Technical Failure

IT users own the triage and fixing of technical failures.  Systems and monitoring needs to be built assuming there will be failures. These types of failures are often due to design defects, dependency problems, infrastructure issues or version mismatch. IT should detect and automatically fixes errors before business users know his is an issue.

Poison data failures may be either Business data or technical in nature.  Some technical triage may be required to determine the owner of service call failures that fail on retry.

Retry ReadyTechnical Triage
processes can fail technically because of network issues, remote system problems, resource constraints, asynchronous timing issues or other reasons.   Systems should build in automated retry that handles the bulk of these situations. Remote system exceptions, network connectivity,, unknown hosts poison messages and failed retires can force triage by the development or technical support teams.  Teams need to build in the logging and message captured needed to determine the causes of technical failures. 

Retry and Triage queues

Asynchronos messsaging make great resources for implementing automated retries and Technical and Business rework queues.  

Two Computers Meet in a Bar

  • How do they know if they are successfully communicating?
  • What tells them if they their conversation is succeeding?
  • Can they partially successful?
  • Who owns failure?
  • Who owns a processing failure

Are We Successful?

  • When another system receives the message?
  • When another system accepts the message?
  • When another system returns success?
  • When some other part of the processing is delayed?

Did We Fail?

  • When a business rule finds an error?
  • When there are soft errors?
  • When a 3rd party is down and we use default behavior?
  • When the receiver has to retry?

Plan Ahead

Owning failures is painful.  I've worked with teams where no one owned failure. Production issues were discovered by users and handled via email.  Everything was reactive. We just sort of planned as if our system could never have problems.  This was a ridiculous approach.
  • Identify Success Scenarios including partial success
  • Identify Business Failure Scenarios and whether they can be remediated by people or systems
  • Identify Technical Failure Scenarios and whether they can be remediated by people or systems

  • Build monitoring, statistics and remediation into the first release
Created 2017 Oct 3
Last Edited 2017 Oct 4
Added AutoHandle 2017 Oct 5

Wednesday, September 14, 2016

AWS EBS storage balancing size versus throughput

One of my teams is rolling out a new application in AWS on EC2 instances where the operating system runs on an EBS drive. We selected one of the M3 machines that came with SSDs.  The application makes moderate use of disk I/O.  Our benchmarks were pretty disappointing. It turns out we really didn't understand what kind of I/O we had requested and where we had actually put our data.

The Root Drive

The root drive on an EC2 instance can be SSDs or Magnetic based on the type of machine selected. All additional mounted/persistent disk drives on that machine will probably be of the same type.  This is an SSD but it is a network drive.  

EBS disks IOPS and MB/S are provisioned exactly as described in the EC2 documentation.  The most common GP2 SSDs have a burst IOPS limit and a sustained IOPS limit. They also have a maximum MB/S transfer rate.  Both the sustained IOPS limit and the maximum transfer rate are affected by the size of the provisioned disk.  Larger disks can sustain higher IOPS and have higher throughput.  

We sized our disk at 20GB which gave us low IOPS credit earning rates and lower MB/S transfer rates. That was a mistake. The sweet spot for disk drive performance is around 214 GB.  This is the smallest disk that gives you the highest transfer rate and the highest burst credit acquisition rates.  

Teams should do their own alaysis before picking the more expensive EBS volume types.  EBS GP2 burstdable SSDs may provide a higher value than fixed provisined SSDs (io1)

Burst Credits

Burst credits are a way you can store IOPS in a credit bucket so that you can exceed your provisioned sustained IOPS rate.  This lets you reach up to 3000 IOPS (GP2) in short bursts without having to pay for higher performing drives.  New machines are given 30 minutes of burst credits in order to provision the machines and warm up applications at the fastest speed possible.  Burst credits are earned based on the size of the EBS volume and the provisioned sustained IOPS rate.  Larger disks earn IOPS credits faster than smaller ones.  

IOPS vs MB/S

<I have no idea what I was planning on putting here>

IOPS, MB/S and Block Sizes

I/O Operations per second, I/O bandwidth and the data block size interact with each other to limit total throughput.  Machines that use the AWS default 16KB block size may not use their full I/O bandwidth.  AWS machines default to 16KB block sizes. Our test results agree with concept.
  • effective bandwidth = number of IOPS * the block size of each write
Teams may have to do some math to tune their disk drives in I/O bound applications.

Ephemeral local SSD

EC2 machines can make use of SSDs attached to the host machine that the EC2 instance is running on.  These disks provide significantly higher performance that must be balances against ephemberal nature. Local SSDs cannot have snapshots and disapear whenever a mechine is terminated and restarted.  All Ephemeral SSD data must be reconstitutable from other data sources since the VM with its local SSD could disapear at anny time

Benchmarks

The following table describes several benchmark tests against various drive configurations. Regular GP2 SSDs provide exactly the specified speed and througput with burst credits available and with no burst credits.  The main area of interest is around latency and the relative performance of EBS vs Ephemeral and around the impact of disk encryption. I don't understand why we have outliers in the data.  Sometimes different machines gave different ephemeral performance.  Note that Amazon does not seem to specify performance data for local SSDs.

Process Reference pages

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/benchmark_procedures.html
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html

Results

This chart show the transfer rate of various disk drives.  It makes clear how much of a performance improvement can be obtained using Local SSD (ephemeral) drives over network attached EBS.  


Disk SizeDisk TypeMB/s rand write 512m bs=16kiops rand write 512m bs=16kclat latency 95% usecMB/s rand read 512m bs=16kiops rand read 512m bs=16kclat latency 95% usecMB/s AWS max MB/s 16KB maxIOPS SpecTest Device
20GBSSD (GP2)1601602.560.9660/3000No Burst
80GBSSD (GP2)47299183844930701235212848240/3000Burst
80GBSSD (GP2)3.82403.824010.243.84240/3000No Burst
240GBSSD (GP2)47.82990151684829991568016048720/3000Burst
240GBSSD (GP2)42.426521676857.335851568016048720/3000Burst
32GBEphemeral3161979236022527n/an/an/an/a
32GBEphemeral1288059432040425272916n/an/an/an/a
20GBSSD (io1)8503138741288500Fixed

*Measured and calculated using Burst credits
#Measured and calculated with no available Burst credits

Disk encryption does not affect disk throughput or IOPS.  It does increase disk latency. Local SSD performance can be affected by other VMs on the same hardware device sharing the same drives.

System configuration and Test Commands

ssh -i speedtest.pem ec2-user@ec2-54-196-6-33.compute-1.amazonaws.com
sudo yum update
sudo yum install fio
sudo mkfs -t ext4 /dev/xvdb
sudo fio --directory=/media/ephemeral0 --name=randwrite --direct=1 --rw=randwrite --bs=16k --size=1G --numjobs=16 --time_based --runtime=180 --group_reporting --norandommap
sudo fio --directory=/media/ephemeral0 --name=randread --direct=1 --rw=randread --bs=16k --size=1G --numjobs=16 --time_based --runtime=180 --group_reporting --norandommap 

Other AWS related references



This blog not yet finished

Wednesday, September 7, 2016

An Environment for Every Season

Video Presentation

This is really talking about different types of software environments and how they are used during the various phases of software development, integration and support.  

Environments are the touch points between your system and other teams operating on their cycles and timelines. They are also control points where controls is incrementally applied or where access is restricted based on the sensitivity of the data in that environment and its distance from production or development.

System Complexity

The number of system environments (sandboxes) required depends on your system complexity, your tolerance for defects, your level of Ops and DevOps automation and the needs of other systems that yours integrates with. This diagram shows a couple application components that integrate with 4 different partner teams and 3 data stores.  Each partner team has its own release cycle and testing needs.  We aim for loose testing coupling but that is often impossible either because of the heavy investment in mock services or because of the complexity of the interactions or the data.



Future Code Pipeline

Different companies have different names for their various stages in their pipeline.  Partner teams testing against a future release will integrate with environments on this pipeline.

The environments in white represent highly accessible  environments with low stability and low risk.  The yellow environments represent more stable environments with significantly less manual access than the earlier environments.  The red environments represent highly controlled "production" or "production like" environments that are treated as if they contain sensitive information.

CI represents the build and unit test environment.  It really isn't a true environment at all and is solely used for code base health and white box testing.  

Some companies have Mock Integration environment that can be used to run basic automation, integration or smoke tests.  Partner systems are mocked so that this environment is not at the mercy of partner deployments and stability.  Mock Integration environments tend to get developed after a team has had trouble running the nightly automation tied some other team environment's stability issues.  Dev Integration represents the first environment where your system integrates with partner systems.  Tests here  should run nightly or several times a week. Basic automation, functional or smoke tests, run in this environment.  Stability depends on the integration points .  Some companies have a partner integration environment that represents the next release but is more stable that Dev Integration.  

QA (yours) is the environment used by true testers doing either manual tests or using various types of automation. This environment is owned by "your" QA team. It integrates with QA environments provided by other teams. Builds here usually happen several times a week.

User Acceptance Test environments are the place end users, product owners and other interested parties run their tests. Deployments are "on demand" of the users in this environment.  Agile doesn't really talk about or support this within the sprint concept.  This is normally the environment used to get "external sign-off".

Prod is production.  You knew that.  Prod Mirror usually mirrors the current code in production.  It is used for production triage where testers and developers are , rightfully, not allowed access to the production system. This is sometimes called names like Production Triage.


Partner on Production Path. 

Partner teams need a place to test their future code against your current release.  They use this so that they can release without being locked into your  release cycle.  These really depend on integration complexity and how different the release cycles are.  Note that these environments use the "current" data model and schema while the new development environments use a "future" data model and schema.



Partner Integration is your integration environment that is consumed/used by the partner integration teams. This where they run functional or integration tests.

Partner QA is the environment foreign QA test teams integrate with.  Their Internal QA environment communicates with your Partner QA environment.  Some teams will forgo this and use the Prod Mirror environment mentioned above.

Tradeoffs

The number of environments can clearly be reduced by software teams that don't think they need it.  It can also be expanded to include branches, security testing or for other needs.  The decision is a tradeoff between operational  complexity and conflicting needs with respect to the data available, environmental SLA and the version of the code to be deployed.

My last three major enterprise customers all had between 5 and 7 environments.  This doubled if the same application was deployed in different data centers like cloud and on-prem.  

Highly automated companies, running in the cloud, have the added option of tearing down environments while they are not being used.   Good examples might be various QA environments or Performance testing environments.

Sunday, September 4, 2016

The Cloud is an Opportunity

"Excellence is a continuous process and not an accident"
A hybrid cloud is just an offsite data center if you migrate your applications and processes as is.

This Topic on Video

Cloud as an Opportunity

Life in the Cloud Should Be Different 

  • Opportunity to bake in policies and practices
  • Full automation is possible and required to feed continuous processes
  • Continual building and destruction of infrastructure is desirable over stale configurations
  • Dynamic and on-demand capacity is available and should be leveraged
  • It is easy to isolate teams, applications and partner firms using built in tools.
  • Resiliency must be part design and not an afterthought.
A cloud migration is an opportunity to bake in policies and practices that were impossible in your previous environments.   It is an opportunity to leverage cloud vendor provided security, automation and pre-built services in a way that increases your team's capabilities.  The cloud lets you automate your infrastructure so that you are no longer fearful of making changes or rebuilding networking, servers, load balancers or data stores.  Cloud teams regularly destroy and rebuild infrastructure guaranteeing that you know how it goes together.  Cloud subscriptions/accounts and networks let you manage and isolate applications, teams and third party applications in ways that network alone didn't in the past. Cloud environments move away from fixed deployments with fixed addresses. They move towards dynamic deployments, address mobility and require that applications be more resilient from day one.  This lets them handle the more chaotic nature of Cloud environments and makes them more robust in the face of system or other failures.

Hybrid Cloud is Not an Off Site Data Center

Bake Security into Everything You Do

  • In Transit
  • At Rest
  • Application Authentication
  • Application Authorization
  • Credential Management
  • Operational Roles
In-house data centers are often not very secure.  They tend to move data across the wire without in the clear and leave data unencrypted when in databases and on disks. Cloud migrations often start with a certain level of paranoia.  Internal services often have weak internal call verification and tend to trust other services in the same containers or networks.

Cloud migrations tend to force a reevaluation based on the notion that data is no longer inside the company.  A cloud migration is the perfect time to secure data in transit and to secure data at rest.  A fair amount of effort is required to understand the at rest requirements. Data Storage products may offer their own encryption for all or portions of data.  Some companies only protect sensitive data.  Others decide that it is simpler to have a single level of security for each device type, RDBMS, NoSQL, File, Blob...

<this portion AND to be updated later>

Automate Everything

  • Build and Test
  • Infrastructure and Network
  • Monitoring
  • Recovery
  • Data Handling
  • Price and performance selection

Services Catalog

Data Services Catalog

  • SQL Data
  • NoSQL Data
  • Large Data Storage
  • Search
  • Messaging

Non-Data Services Catalog

  • Provisioning
  • Monitoring
  • Computation
  • Scaling
  • Network, Firewalls, Routers
  • Zero Provisioning Application Platforms

Cloud Accountability

  • Costs are Explicit
  • Resource Consumers are Exposed
  • Teams and projects pick their own cost models.
  • Data is visible in common consoles

Cloud Risks

  • It feels like a shiny object
  • Staff must be multi-functional
  • Encryption keys and certificate management are critical
  • Network edges must be protected
  • Public Services Security Risks must be understood
  • Some will not believe automation and change is possible or desireable



Thursday, July 7, 2016

Thinking Putty in the Cube Farm: free range or caged

Various companies like Think Geek and Crazy Aaron's sell palm sized blobs of  "Thinking Putty" as a thinking aid, as a form of stress relief and as nervous energy sink.  Once released into an office, you will find people subconsciously picking up their putty to pull, knead, squeeze and fold it while working on projects or while in discussion.

Putty has some odd properties.  It can be squeezed and shaped. It can also bounce like a rubber ball. It is solid but acts sort of like a very thick liquid.  Sculptures made out of the putty start slumping immediately and end up as a pool of putty within hours.  Putty can be pressed and worked but can also be torn as if it is a solid.

Putty, comes in fun colors with and effects.  It can be plain, UV sensitive, glow in the dark, magnetic and heat sensitive.  The can at the right is heat sensitive, changing color from orange to yellow when warmed by friction, body heat or by some other heat source.

Caged or Free Range

I personally recommend penning your putty at night.  Its just going to damage itself or something if it is left to its own devices at night.You don't want your putty working into every crevice after leaving it on your keyboard at night.. 

Putty comes in a can that keeps it safe and clean.  It is usually best to leave putty kenneled in its shipping container at night.  The putty in the picture at the right was just dropped into the its pen.  It will spread out covering the whole bottom of the container by morning.

Some folks insist on leaving their putty on the desktop or some other place.  This invariably leads to trouble with the putty running into other items in the area.  




The next two pictures show some putty that has run into a pack of cube clips and post it notes.  You can see that the putty has filled all available crevices. We were able to fix this situation by hanging the mess above a desk. The putty eventually all dripped/ran to the table top dropping its connection with the other objects. This will not work for some surfaces like hair and clothing.




Young versus Old Putty

Thinking putty never really wears out.  I've had a couple cans for over 10 years.  It does start to look and feel old as it picks up contaminants like dust, hair and dirt.  This is especially true if you bounce it off the floor , drop it a lot or use it with dirty hands.

The two cans to the right are the same product with 10 years difference in age and usage. The one on the left still changes color with heat. It just isn't as noticeable.  The new one on the right is significantly brighter and more obviously changes color from orange to red


Conclusion

Thinking putty is an awesome work place item.  Keep it penned at night. Don't let it free range across your desk and gear.  Keep it out of stuff you will have a hard time separating from the putty.

Enjoy!

Sunday, February 28, 2016

Received and sent messages in a single mailbox with MS Outlook for OSX

Microsoft Outlook for the Mac and PC behave differently when showing conversations in the Inbox. The PC shows received and sent messages. The Mac shows only the received messages.  There is no default way to show a threaded conversation on Mac Office 2016.

Microsoft Outlook for the Mac is integrated with OS/X spotlight search so that AppleScript and Spotlight can be used to create Outlook Smart Mail folders. Smart Folders are more like views into mailboxes than actual mailboxes. They are virtual folders that are created from the results of a search.  This blog leverages Outlook's raw search capabilities that come from OS/X integration.  You can find out more information about this integration on the Microsoft answers web site. Portions of this blog came from this excellent blog posting.

Identify Mailboxes to be included in Smart Folder

Our conversation SmartFolder is made up of the contents of the Inbox and Sent mailboxes. We first need to identify the Microsoft Outlook folder identifiers for the two mailboxes.

  • Run Outlook
  • Highlight the Mailbox,  we are going to include, Inbox or Sent. We want to get this mailbox's folder id.
  • Start a Mac Spotlight search
  • Enter applesoft and Enter to bring up the applesoft editor.
  • Enter the following in the applescript editor and run it
on run {}
tell application "Microsoft Outlook"
get selected folder
end tell
end run

  • It should return the results of the execution
    mail folder id 109 of application "Microsoft Outlook"
  • Do the same thing for the other mailbox.  
  • My mailbox numbers were 109 (Inbox) and 112 (Sent)

Create an integrated Threaded Conversation 

Build Smart Folder with a Raw Search

  • Click on the Search field in the upper right hand corner of the Outlook view.
    • This will enable the search tab and ribbon
  • Select the Search tab
  • Select All Mailboxes" in the Ribbon Bar
  • Press the Advanced search button in the ribbon
  • Select Raw Query from the drop list. 
  • Enter the following query, replacing 109 and 112 with the mailboxes numbers retrieved above.
    com_microsoft_outlook_folderID == 109 || com_microsoft_outlook_folderID == 112
  • Press Enter.  The Smart Folder should populate with the combined content of the two folders. Conversations in this combined Index/Sent folder will include both inbound and outbound messages.
  • Press Save Search in the Search ribbon bar and enter the name of your new Smart Folder.
Your search bar should look something like the following


Caveats

Smart folders are query result views and not real folders. You can use the standard Search functionality against a Smart Folder.  The system treats the additional Search terms as part of the Smart Folder's query and will ask you if you wish to change the folder query every time you move Outlook from the Smart Folder to a traditional folder.  You can tell Outlook to "not save" the changes.  Yeah, it is kind of annoying.

Create an Unread Email Smart Folder

You can create a Smart Folder of just unread messages similar to the conversation folder described above.
  • Select the Inbox
  • Select Search
  • Select the Search Tab if it is not showing
  • Select Advanced Query in the Search Ribbon
  • Change the query type to Raw
  • Enter the following into the raw query area:
com_microsoft_outlook_unread != 0
  • Press Save Search and enter the name of the new Smart folder.

Additional Resources

Discussion on the mdls command and Mac / Outlook variables for raw queries can be found in this Apple discussions thread.

Created 2016 Feb 02

Monday, February 15, 2016

Almost PaaS Document Parsing with Tika and AWS Elastic Beanstalk

The Apache Tika project provides a  library capable of parsing and extracting data and meta data from over 1000 file types.  Tika is available as a single jar file that can be included inside applications or as a deployable jar file that runs Tika as a standalone service.

This blog describes deploying the Tika jar as an auto-scale service in Amazon AWS Elastic Beanstalk.  I selected Elastic Beanstalk because it supports jar based deployments without any real Infrastructure configuration. Elastic Beanstalk auto-scale should take care of scaling up and down for for the number of requests you get.

Tika parses documents and extracts their text completely in memory. Tika was deployed for this blog using EC2 t2.micro instances available in the AWS free tier. t2.micro VMs are 1GB which means that you are restricted in document complexity and size. You would size your instances appropriately for your largest documents.  


Preconditions

  • An AWS account.
  • AWS access id and secret key.  This is most easily created in the AWS web console.  Amazon recommends using IAM credentials. I used the default account credentials since this was done mostly as a PoC. Remember to save any key id and key values that you need. They cannot be recovered once generated.
  • Read the Amazon Elastic Beanstalk command line instructions

Not Addressed

  • Using the Amazon Console to do web based deployments. I decided to do this with command line tools to get a feel for automation possibilities.
  • Limiting access to this service , access controls
  • IAM credentials.
  • Load testing with something like JMeter

SSH

SSH must installed and on your command line path. The Elastic Beanstalk command prompt expects ssh-keygen to be on your path. 

I did this work on Microsoft Windows 10 so I needed the windows tools. .  Microsoft is now contributing to this SSH distribution on github. I installed the 64 bit version in c:\Program Files\OpenSSH-Win64. Linux / Mac folks can use their favorite tools. 

Python and Pip

Python must be installed and on your command line path. Install Python and pip per the AWS Elastic Beanstalk CLI instructions. The page describes Windows and Linus installation p;rocesses. Make sure to add the Python directories to your environment PATH variables.

Elastic Beanstalk CLI

Install the Elastic Beanstalk CLI after installing Python. You can find it in the same CLI web page

Creating an EB Environment and Deploying Tika

Terms
  • Working directory: The name of the directory your command prompt is sitting in. This is the directory where .elasticbeanstalk/config.yml is created.
    • I usually prefix mine with my company name to to simplify uniqueness constraints later.
  • Application name:  This is the name of your directory by default.  It can be anything.  The application name is the root of the external URL and should be unique. 
    • I usually prefix mine with my company name to to simplify uniqueness constraints later.
  • Environment Name: Applications can be deployed into different environments with different properties. 
    • This tends to default to <app_name>_dev for development environments.
Steps
  1. Create a working directory.  This will probably be the same as your app name. 
    1. I named mine fsi-tika-eb for FreemanSoft Tika ElasticBeanstalk. I'd probably pick something more like fsi-eb-tika if I built up a demo enviornment in the future.
  2. CD into the directory.  The .elasticbeanstalk/config.yml file will end up here
  3. Download the tika-server jar file and put it into this directory. I used version 1.11   
  4. Initialize the eb command environment and answer its questions
    1. Run eb init
    2. Pick data center your company uses or the nearest data center.
    3. Enter the application id and secret.  I used my test account credentials.  You should use your IAM credentials.
    4. Enter your application name. It may default to your directory name.
    5. Select Java as the platform.
    6. Use Java 7 or Java 8
    7. Let it create the necessary SSH credentials.  
      1. This section fails if SSH is not on the path.
      2. Let it create a SSH key set with the default name if this is the first configured shell.  It selected aws-eb for my keyset name
  5. The config.yml file contains the settings selected during eb init.  Normally the eb create command would try run a command or deploy a directory  We have a single all-encompasing Tika jar that we downloaded above.  This means we can set the default deployment artifact to the jar file name.
    1. Edit .elasticbeanstalk/config.yml
    2. Add a new section
          deploy:
             artifact: tika-server-1.11.jar
  6. Create a new Elastic Beanstalk Environment and auto deploy the application.   We can choose the default options and reconfigure later or we can try and configure the load balancer port and machine size in a single command.
    1. Option: Single command line
      1. Create and deploy the application
        eb-crate --instance_type t2.micro --envvars PORT=9998
      2. Accept any defaults offered
    2. Option: Basic commands
      1. Create and deploy the application eb-create 
      2. Accept the defaults.
      3. Set the port number.  This causes the application to redeploy:
        eb setenv PORT=9998
You should end up with Tika deployed a single t2. micro instance deployed with auto-scale enabled up to 4 nodes.

Load Balancer Notes

The AWS Load Balancer listens on port 80 and assumes that the EB application is running on port 5000.  We have to change Load Balancer back side port or we have to change the port Tika is listening on.  It is easiest to just change the load balancer back side port since that can be done with just the PORT system property..

Verification

Basic Server Verification

Open a browser and do a GET request against your application.  The default naming is 
http://<application_name>_<environment>.elasticbeanstalk.com. 
My demo showed up on 
fsi-tika-eb.elasticbeanstalk.com
Executing a GET against the Tika parser
http://<application_name>_<environment>.elasticbeanstalk.com/tika
 My demo showed up on
http://fsi-tika-eb-dev.elasticbeanstalk.com/tika
and resulted in the message
This is Tika Server. Please PUT 

Parsing a Test Document

You can test the Tika server from any HTTP test tool like the Chrome POSTman plugin. The Tika Server API is documented on the Apache Tika Wiki.

Example File Parsing
  • Select PUT as your HTTP method
  • Set the Body type to Binary
  • Select the file that will act as a body.  
  • Select the MIME type you want back.
  • Use the /tika path, EX: http://fsi-tika-eb-dev.elasticbeanstalk.com/tika
  • Select an XLS files
  • Tell Tika you want HTML back with the header Accept: text/html
  • Submit the message. 
  • You should get get back an HTML table representation of the XLS file.

Example Data Type Detection
  • Select PUT as your HTTP method
  • Set the Body type to Binary
  • Select the file that will act as a body.  
  • Select the MIME type you want back.
  • Use the /detector/stream path EX: http://fsi-tika-eb-dev.elasticbeanstalk.com/detect/stream
  • Select a PNG file
  • Submit the message
  • You should get back the mime type of the document you sent.
See the Apache Tika Wiki page for more information.

Viewing Logs

You can view any of the captured log files using the eb command prompt. Open up a command prompt in your application's directory.  Enter the following to see logs
eb logs
 Deployment , access and application logs will be retrieved and displayed.

Additional Reading

Amazon publishes some free AWS kindle books/booklets including the following
  • http://www.amazon.com/AWS-Elastic-Beanstalk-Developer-Guide-ebook/dp/B007Q4JFE0

Created Feb 15 2016

Thursday, February 11, 2016

Slice Splunk simpler and faster with better metadata

Splunk is a powerful event log indexing and search tool that lets you analyze large amounts of data. Event and log streams can be fed to the Splunk engine where they are scanned and indexed.  Splunk supports full text search plus highly optimized searches against metadata and extracted data fields.  Extracted fields are outside this scope of this missive.

Each log/event record consists of the log/event data itself and information about the log/event known as metadata.  For example, Splunk knows the originating host for each log/event.   Queries can efficiently filter by full or partial host names without having to specifically put the host name in every log message.

Message counts with metadata wildcards

One of the power features of metadata is that Splunk will provide a list of all metadata values and the number of matching messages as part of the result of any query.  A Splunk query returns matching log/event records and the the number of records in each bucket like #records/hostname.  A Splunk query against a wildcarded metadata filed like hostname returns the number of records for each hostname matching that pattern.

Some day there will be a screen shot right here.

Standard Metadata

All Splunk entries come with a few basic metadata fields.


NamePurposeSample Values
indexThe primary method for partitioning data. Organizations often route data to different indexes based on business unit , production status or sensitivity of the data.

This is also the primary attribute for access control.  Users are granted access to individual indexes.  
index=mydivision_prod
sourceThe originating source of a file.  This is usually the file name in the case of file logs. Queries can pattern match against file paths or full names to narrow down other search criteria. This is useful when looking for a particular type of problem on one or more nodes.

The source and sourcetype may be the same in the case of non-file resources.
/var/log/myapp/*.log
/var/log/myapp/out.log
WinEventLog:Application
sourcetypeThe type of log, used to drive parsing templates.  This can be used as a broader filter to look at all or some subset of log files while filtering out system events.

The source and sourcetype may be the same in the case of non-file resources.
sourcetype=log4j
sourcetype=log4net
sourcetype=WinEventLog:Application
sourcetype=WinEventLog*
sourcetype=syslog
sourcetype=tomcat

hostThis is the hostname the message came from. Hostnames can be explicitly provided or provided as part of a pattern.  This is useful when cluster nodes share similar hostnames or when looking at problems on a specific host.hostname=RD000*
hostname=WIN_J8AD78
hostname=Linux*

The default metadata fields make it easy to filter down data without adding explicit values in each individual log/event record.  Using just the standard metadata causes teams to twist the source or sourcetype fields in unnatural ways.






Hacking standard field values

The standard fields do not provide enough axis upon which to partition logs or events.  Organizations often use implicit standards to make it possible to filter out information based on environment or application.

  • Organizations move the location of log files based on the system environment putting production logs in /var/log/prod/myapp/foo.log and QC logs in /var/log/qc/myapp/foo.log.  Then they query by environment by pattern matching the file names.  This only works for log files and not system events or syslog.  
  • Organizations filter for applications or environments by host names counting on standard host naming conventions.  This can be cumbersome and may not work at all for PaaS style hosts created in cloud environments.

Both of these are hacks that can be avoided with the use of additional metadata via the _meta tag.

Recommended Metadata Additions

Custom metadata can be configured through the _meta tag in the Splunk Forwarder inputs.conf files. They can be added in global or application configuration files.  Custom values can be added at the top of input.conf to apply to every source or on each individual source in the inputs.conf file.



NamePurposeSample Values
environmentSoftware moves through different environments on its way to production.  Log analysis for troubleshooting or statistics tends to occur at the environment level.  This can be greatly simplified by binding logs to "well known" environment names.

It is sometimes possible to filter queries against environments based on host names.  This has a sort of "magic spell" feel where everyone has to know the magic host naming conventions.  It becomes complicated when there are multiple environments of of the same type.   An organization may have multiple QA/QC environments may similarly named hostnames. 
env::INT1
env::QC
env::QC2
env::PROD
env::PRODHA
applicationThis is the overarching name for an application that may include multiple deployed components or tiers.  All components, web, workflow, integration share the same application value.

This may be the official application name or abbreviation for many large companies.   
application::service
application::sales
application::intake
roleEach application component plays a part or has a role in the overall application.  This can be a tier name or a specific function name. There is a 1->N relationship between application and role.role::ui
role::lookup
role::workflow
instanceThis value specifies the individual instance of a multi-host/multi-deployment component.  Instance names may be redundant for hostnames in some situations. There is a 1>M relationship between role and instance.

The instance value may be the host name, an auto-generated instance id (for PaaS style) or a software partition name in the case of a multi-node component.  Note: that this can be especially useful in autoscale environments where hostnames may be shared.
instance::1
instance::P8
instance::mylinuxname
runlevelYou may wish to create some grouping bucket one level up from environment.  This could be something that groups all of a certain environment type like QC that contains environments QC1 and QC2.  Or it could be a prod/non-prod discriminator so that production logs can be easily isolated. This can be useful in the unfortunate situation where production and non production logs share the same index.runlevel::prod
runlevel::nonprod

Cloud Metadata

Cloud vendors often have additional metadata about their deployments or environment that can be extracted and configured into Splunk inputs.conf files.  Teams should consider modifying Splunk deployment automation scripts to pick up the values.  Examples include but are in no way limited to the following:

ProviderNameUtility
Microsoft AzureCloud Service  This represents the load balancer or application pool name.  It can be very useful when troubleshooting or creating performance graphs.

Multiple application components can operate within a cloud service.  This may align with application or may be a level in between application and component. 
Amazon Web ServicesAMR version  This is essentially the virtual machine template version. This can be useful when creating an AMR inventory.



Created 11/Feb/2016