Cloud Data Lake vs Warehouse - fit for purpose

Data Lakes and Data Warehouses each have their own strengths and weaknesses.  You may need one or the other depending on your needs. Look at your use cases to determine whether it makes to have one or the other or both. 

Maybe this can help you with more things to think about when making a decision of one over the other.


My general experience has been 
  • Data Lakes tend to be the choice when feeding operational systems and when storing binary data.  They are often used for massive data transformations or ML Feature creation. Sometimes security concerns and partitions may drive highly sensitive data to protected lakes.
  • Data Warehouses tend to be the choice when humans need big data for reporting, data exploration, and collaborative environments. Use cases that put them in the middle of data flows for operational systems should be evaluated for uptime and latency.
Different companies will prioritize differently.  I've seen companies that were lake only, companies that had both, and companies that tried to be warehouse only

Video

other





Comments

Popular posts from this blog

Installing the RNDIS driver on Windows 11 to use USB Raspberry Pi as network attached

Understanding your WSL2 RAM and swap - Changing the default 50%-25%

Almost PaaS Document Parsing with Tika and AWS Elastic Beanstalk