Data Platform Journal #10
What is a Data Platform, Kimball Model with dbt, Real World Tips on Running Databricks at Scale and Entity Resolution 101.
New milestone this week: we are at issue 10! Thank you to all who have subscribed, shared or viewed a few articles. Your kind words have made the evenings spent fighting with my poor grammar worthwhile!
I’ve also released this on Tuesday, due to it being a holiday yesterday in the UK. There is a holiday next Monday in the UK as well, so expect next week’s issue to land on a Tuesday as well.
This week we have:
What is a Data Platform Anyway?
Building a Kimball Dimensional Model with dbt
Real World Tips on Running Databricks and Delta Lake at Scale
Entity Resolution 101
From Software Observability to Data Observability
What is a Data Platform Anyway?
I figured after 10 issues of making a newsletter around Data Platforms I should at least explain what one looks like!
This is part two of my “How to Build a Data Platform“ guide.
Building a Kimball dimensional model with dbt
One week I won’t post a dbt themed article, but I’m afraid it won't be this one!
Not only is it a great tutorial by Jonathan Neo, a Data Engineer at Canva, explaining what a Kimball dimensional model is, but also how it integrates with dbt and what process you should follow to implement a Kimball data model.
If you’re wondering when should implement Kimball, or an alterative, I’m planning to include the thought process as part of my guide. However, for me the general rule is that Kimball models work best if used for Business Intelligence applications like Power BI or Tableau; For use cases that query a Data Warehouse or Data Lakehouse directly, One Big Table may work better.
Real Talk about Running Databricks + Delta Lake at Scale.
Daniel Beach has posted a number of great articles about Delta Lake and Databricks in the past, and this is no different, with a list of tips on how to build a cost efficient and robust Data Lakehouse.
Entity Resolution 101
Entity Resolution is how Master Data Management (MDM) software manages to match records that have slightly different details in them (for example, a person with two different job titles) and the first article from Karim Amer’s newsletter does a excellent job of summarising how it works.
It also worth checking out Ergest Xhebati’s article, which talks about why Entity Resolution is important yet hard to do.
From Software Observability to Data Observability
Seckin Dinc, Head of Data and Product at FREENOW, has been posting some great content on Data Quality and Reliability. This article focuses on the importance of monitoring and what lessons we can take from the software landscape and apply them to data.
Sponsored by The Oakland Group, a full service data consultancy.
Photo from Unsplash.