Schema on Write - Consumer Driven Schemas

May 25, 2020

What does it mean to move from a Relational Database style Schema on Read to Schema on Write?

Schema on Write is used to stage data in a consumer friendly form. It can also be used in poor-join-performance environments to restructure and stage data in consumer read format. It is pretty much mandatory for Document Databases.

Ingestion stores data in its original format for compliance, audit or other purposes. This copy may be called True Source.
Format Standardization converts the raw information into and agreed on standard format. Examples include Data Tables in a lake or documents in a document store. This is purely a mechanical conversion.
Consumption Model are built from raw data, reference data and applies view and business rules creating a consumer ready dataset.

The data sets may have to be filtered , schema-on-read, based on user permissions and authorizations.

Consumer View models are built and to specific consumers or consumer groups. They are often based on different data visibility for various consumer groups. Consumer View models are often built to simplify usage of, and alignment with, vendor IAM tools and roles. This can remove the need for proxy tiers, view layers or other Schema on Read mechanisms.

Video

Organize for Ease of Use

Tables/Data organized as schema on write are similar to RDBMS Materialized views. RDB materialized views are used to shape the data while providing high performance.

Organize Data to Leverage Native Authorization

Data organization can either be aligned to the access control and I/O patterns or the custom access controls and I/O abstractions can be bent match your data organization. Organizing the data for consumers and to align with vendor permissions can have significant impacts.

Storage is Cheap

Schema on right Lakes or Document databases trade off storage and normal forms for performance. They can do this because we have moved to a point were storage is cheaper than the work needed to maintain traditional models.

Instrumentation and Lineage

The diagram above represents data transformation. Data is received , formatted and enriched. Users can consume the data anywhere along the transofrmation journey. Regulators and auditors may need to understand how true source data on the left is transformed prior to being given to users and programs.

Why Document DBs and Cloud Lakes?

It doesn't seem like Document DBs and Lake Object stores have much in common. They have some similarities with respect to access controls and storage philosophies if we ignore how the data is actually managed in storage.

CQRS for Example

The query side of CQRS may also run on consumer optimized storage since the Query store is used solely by consumers and not by the core application.

Created 5/2020

Blog de Joe Freeman