Issue #28: Fancy Data Stack and Data Contract Retrospectives
Plus: SRE Insights, Elementary for dbt, Principles of Data layers in Data Platform, and Data - The Land DevOps Forgot
No article from me this week, as I’m super busy, but I've built up a big enough backlog of great videos and articles that I’m keen to share:
The Fancy Data Stack - Batch Version
Data Contracts at GoCardless - 6 Months On
Not My Circus, Not My Monkeys Newsletter
Is Data Mesh Only for Analytical Data?
Are You Using Elementary for DBT?
Principles of Data Layers in Data Platform
Data - The Land DevOps Forgot
I will also point out that my “How to Build a Data Platform“ guide is almost done, with only one article to publish that should go out next week. So click the link above if you’ve recently subscribed and haven’t already seen them.
Also, leave a comment if I’ve missed any major elements of building a Data Platform, as the long-term plan is to go back and update the guide over time.
The Fancy Data Stack - Batch Version
This is a great idea by Christophe Blefari, a Senior Data Engineer, who presents his “ideal” data stack with few constraints. I find his choices and reasoning to be on point, though I will say every organisation is different, so it will have different “ideal” stacks.
Data Contracts at GoCardless - 6 Months On
Andrew Jones, author of a book on Data Contracts, which has many great reviews, has written a retrospective on implementing Data Contracts at GoCardless, an online payment processor with $30 billion in transactions in 2022.
Not My Circus, Not My Monkeys Newsletter
Site Reliability Engineering Coach and ex-Colleague of mine, Mark Ellens, has started regularly posting great insights into Monitoring, DevOps and Testing for the last few months, based on his vast experience of building and maintaining modern, large software systems.
It also often crosses paths into data often (what doesn’t! ;D). Highly recommended!
Is Data Mesh Only for Analytical Data?
Piethein Strengholt, author of Data Management at Scale (see my glowing review here), has another great article.
I’ll digress here by saying the first draft of my article on Data Mesh vs. Fabric and Centralised Architectures did include how a Data Mesh can, in theory, mix both operational and analytics system outputs where it makes sense, but I wimped out and took it out. I wish I left it in now!
From my experience, I’d still setup separate deployment areas (Azure Resource Groups) for security and DevOps reasons, but the outputs can be combined if there is a good business requirement for it and ideally have the same data owner(s).
Are You Using Elementary for DBT?
Lead Data Engineer at New Relic, Leo Godin, presents a convincing argument for Elementary if you want more observability, alerting, and testing tools in your dbt transformations.
Principles of Data Layers in Data Platform
Siva Ilango, Principal Data Architect at JMAN Group, has put together a great diagram of Data Modelling layers and what kind of data model best suits that layer:
The article is also a great read, with lots of great Data Architecture advice.
Data - The Land DevOps Forgot
Great video from Michael Nygard, VP of Data Engineering at a large Latin American bank, Nubank, on how data comes with extra issues with integrating DevOps and how Data Mesh can help.
It also covers a lot of real-world pain issues with Data Meshes too, so far away from another Data Mesh sales pitch.
We’re Now Starting to See Major Companies Implementing DuckDB as their Data Warehouse
In this case, Okta. While I don’t think many enterprises will consider DuckDB, it is definitely worth keeping tabs on it to see if a lot of more tech-centric companies adopt it and if it becomes a game changer for them.
Sponsored by The Oakland Group, a full-service data consultancy. Download our guide or contact us if you want to find out more about how we build Data Platforms!
Photo by Anna Dziubinska on Unsplash
I recently found Monosi for the missing piece of open-source data observability side and it seems interesting (I didn’t fully analyzed it yet).
Check it out here: https://github.com/monosidev/monosi