7th of June, 2023
We can describe a Data Lakehouse as an Data Architecture pattern where we combine the Data Management structure found in traditional Data warehouses with the scalability of a cloud storage on a Datalake. But, it is more than just a Data Architecture pattern.
It is a new technology, that brings agility and scalability of storing on the data lake and pairs it with the performance and structure imposed by a Data warehouse, thus overcoming the limitations of a traditional Data warehouse. Features such as ACID transactions, Change data capture, auditing changes and roll-back. These wouldn't be possible without an open source format in Apache Spark, called "Delta" format.
Essentially, Delta format is an extension on top of Parquet file storage, that enables meta data to be maintained around Parquet to capture changes, enable version control, optimisation, and roll-back.
Furthermore, if we look the technologies that constitute a Lakehouse, these will consist of the following:
Lakehouses epitomise the emergence of a modern and personalised data architecture, surpassing the limitations of traditional data warehouse and data lake setups. They are the preferred choice for organisations seeking a data architecture that offers flexibility, scalability, and exceptional performance.