Showing posts from June, 2020

Lake Mutability in the Age of Privacy and Regulation

Data immutability was one of the core tenants of Data Lakes when they first became big.  Mutable data went to Relational and Document databases while immutable data and and documents were store in the lake .   Emerging privacy regulations and data sharing regulations are adding data retention, data visibility and data management rules and behaviors that may drive companies to re-think which data should be stored and how data should be stored in data lakes .  Video blog Phase 1: Data Set Storage Retention Retention times are are set on the file(s) that make up a dataset. Datasets are managed as files. Entire datasets are removed at the end of the retention period. Phase 2a: Partition Storage Retention Retention times are stored somewhere and bound to partition keys.  Data is organized as tables in a a table/partition/file format.  Partitions are based on dates