A Data Platform is all the components needed to serve data in your organisation or subset of organisation. They are typically built either in public cloud, on-premise or a combination of the two. Sometimes Data Platforms can sometimes be mistaken for just a Database, Data Warehouse or Data Lake, when it is actually made of many components:
Various data stores
Computation for data transformations from raw data to curated, clean data.
Automated testing (Including code, infrastructure & data quality)
Data governance (policy, process and software)
Security components (such as firewalls and encryption)
Logging components for monitoring and alerting
DataOps deployment pipelines and code for automating changes at scale
It’s source data, data transformations and output data
But at a conceptual level, a very basic Data Platform may look like this:
You have data sources, transformations that need doing to that turn your data into something that can use gain analytical insights from it and then outputted to somewhere your Data Platform users can access. Plus governance and security controls to make sure you are building the right Data Platform and have sufficient data protections in place - we’ll discuss this in more detail later.
The transformations can be done in spreadsheets, databases, data warehouses, data lakehouses or real-time streaming services. You may not need anymore than that. Remember, don’t build what you don’t need or does not bring value.
For those who worry that the word “Governance“ means “slow and costly processes that limit how fast I can develop“ don’t worry, we take a pragmatic view that the governance policy and processes should be lightweight as possible, fit the product / project and what stage it is in where possible. Governance should enable and enhance data solutions, not block them.
This means you can get a Data Platform setup very quickly for simple use cases, however, a very complex data platform in a large organisation may look like in terms of tooling:
This can take many years to fully build out, so careful prioritisation and expectation management need to happen to allow for the successful deployment of a Data Platform.
Recently we’ve also seen large organisations (Netflix, Saxo Bank, Sky) move away from attempting the challenging prospect of moving all their data into one place and towards data meshes, which you could argue is made up of many data platforms each representing a subset of organisational data, but known in data mesh terminology as data products:
This can feel like a overwhelming amount of complexity to deal with, even for a experienced data professional. So what we will do in the rest of this guide is break down components and services you might need and explain why you might need them.
Sponsored by The Oakland Group, a full service data consultancy.