Isolating Historical Data and Breaking Changes
Teams often run into situations where they have a data set that broke its compatibility at some period in time. This often happens when you have historical data that came from a previous system. We want the ability to combine that data in a way that consumers have to understand as little of that difference as possible.
The differences between historical and active data are essentially a major version, breaking change to the data. The two major versions of the data can be isolated in their own raw storage area and then merged together in one of our consumer-driven zones. We can continue to support minor version producer schema changes as they occur in one of the raw streams. Those changes would then be handled in the transformation tier into the conformed zone.
We register and link the three data sets in our Data Governance Catalog. This lets us capture the data models while enforcing data change and compatibility rules. Disciplined organizations will also register the two transformations that feed into the shared dataset.
Video
Created 10/2021
Comments
Post a Comment