Schema on Write - Consumer Driven Schemas
What does it mean to move from a Relational Database style
Schema on Read to Schema on Write?
Schema on Write is used to stage data in a consumer friendly form. It can also
be used in poor-join-performance environments to restructure and stage data in consumer read format. It is pretty much mandatory for Document
Databases.
- Ingestion stores data in its original format for compliance, audit or other purposes. This copy may be called True Source.
- Format Standardization converts the raw information into and agreed on standard format. Examples include Data Tables in a lake or documents in a document store. This is purely a mechanical conversion.
- Consumption Model are built from raw data, reference data and applies view and business rules creating a consumer ready dataset.
- The data sets may have to be filtered , schema-on-read, based on user permissions and authorizations.
- Consumer View models are built and to specific consumers or consumer groups. They are often based on different data visibility for various consumer groups. Consumer View models are often built to simplify usage of, and alignment with, vendor IAM tools and roles. This can remove the need for proxy tiers, view layers or other Schema on Read mechanisms.
Video
Organize for Ease of Use
Tables/Data organized as schema on write are similar to RDBMS Materialized views. RDB materialized views are used to shape the data while providing high
performance.
Organize Data to Leverage Native Authorization
Data organization can either be aligned to the access control and I/O
patterns or the custom access controls and I/O abstractions can be bent
match your data organization. Organizing the data for consumers
and to align with vendor permissions can have significant
impacts.
Storage is Cheap
Schema on right Lakes or Document databases trade off storage and
normal forms for performance. They can do this because we
have moved to a point were storage is cheaper than the work needed to
maintain traditional models.
Instrumentation and Lineage
The diagram above represents data transformation. Data is received
, formatted and enriched. Users can consume the data anywhere along
the transofrmation journey. Regulators and auditors may need to
understand how true source data on the left is transformed prior to
being given to users and programs.
Why Document DBs and Cloud Lakes?
It doesn't seem like Document DBs and Lake Object stores have much in
common. They have some similarities with respect to access controls
and storage philosophies if we ignore how the data is actually managed in
storage.
CQRS for Example
The query side of CQRS may also run on consumer optimized storage since
the Query store is used solely by consumers and not by the core
application.
Created 5/2020
Comments
Post a Comment