This post is about module 11 in the Cloud Skills challenge.
Organizing a Fabric lakehouse using medallion architecture design
This module is about the concepts of medallion architecture design – Bronze, Silver, and Gold layers of a lakehouse. The focus is understanding how to effectively organize, refine and curate data. It is a recommended data design pattern used to organize data in a lakehouse.
The exercise in this module has the user creating a lakehouse and walking through ingesting data into a "bronze" layer, transforming it into a "silver" layer, and further transforming into a star schema "gold" layer for reporting, creating a semantic model and then creating relationships between the tables in Power BI. The transformations are done in Notebooks with pySpark.
The link below is the session I attended on this topic. It was the 7th session in the live video series.
I admit I hadn't heard the term "medallion architecture" before but the concepts are not unfamiliar to me. The most interesting parts of the discussions were around data security and automating deployment.
From a security standpoint, defining who needs/has access at any given layer is important, so we can ensure that only authorized users can interact with sensitive data on one hand and ensure that data governance is maintained. I can envision scenarios where someone has too much access to a bronze or silver layer and inadvertently makes changes to something that ultimately impacts the semantic model further down the chain. The bronze layer would be the most restricted, read-only, where the silver layer would need a balance between flexibility to perform data modelling tasks and security of the data.
Original Post https://jenkuntz.ca/2024/02/ms-learn-fabric-analytics-engineer-challenge-part-8/