How to Build a Data Platform: People & Teams
Roles required, Effective Delivery, Team Design, Communication and Scaling Data Teams.
Part of my guide on “How to Build a Data Platform“:
Introduction
Unless ChatGPT manages to figure out how to automatically make Data Platforms from a text prompt, you’ll need people to build your Data Platform.
More than that, you’ll need a team of people that can work well together to deliver your Data Platform successfully. So first we ask, what roles do we need to build a Data Platform?
So, What Roles are often involved in building a Data Platform?
Data Engineers and/or Analytics Engineers
Cloud and/or Platform Engineers
Data Architects
Solution Architects
Data Analysts
Data Scientists and / or Machine Learning Engineers
IT Administrators
Support Engineers
Security Experts
Business Analysts
Subject Matter Experts (SME)
Product or Project Manager
Budget Holder, Data Owner, and/or Product Owner
Senior or Executive Management
That’s a lot of roles! Note that most of the above will only work for days or even hours on the Data Platform, but they are vitally important none the less.
Also, many roles will be filled by the same person; in fact, we recommend generalists over specialists to make your data teams more resilient to business and staff changes. Though beware of overextending this; if a person wears to many hats, they’ll more likely get burned out from all the context switching.
As you may of guessed from the above, building a successful Data Platform requires a lot of teamwork and collaboration, so the focus of this section is on how to get people working as a cohesive, engaged team to deliver a Data Platform or Product, in a reliable but efficient manner.
So what guidelines are there for effective software delivery?
Delivery
There are many frameworks for delivering software: Waterfall, SAFe, Scrum, Kanban and lots more. I (and my employer, Oakland) do not have “one delivery method to rule” unlike other consultancies, but prefer picking the best framework for the project, product, or organisation at that point in time.
There is also nothing wrong with changing frameworks to suit your needs, say, from fast-build Proof of Concepts (PoCs) to slower, more audited production changes, other than the time taken to change.
Also, feel free to adapt the delivery to your needs; a delivery framework is just guidance, not a set of instructions. Though do note that your organisation may mandate a delivery method you have to adhere to.
The one required rule of agile to remember is to regularly reflect on work done to see how your team can improve in the future. Most agile framework diagrams are in a cycle, to reflect the commonly held wisdom that the most effective delivery is one that constantly adapts and learns from its experience.
Effective delivery also requires automation of common tasks, which we cover in the DataOps section. Using metrics from DataOps as well as team engagement can help you quantitatively work out if the changes you’re making are having a positive impact.
I would argue the most important organisational unit when it comes to delivery is the delivery team. So, assess people based on how they contribute to the team, not on individual metrics like lines of code written.
But what does a good data team look like?
Designing Teams
Ideally, aim for three to eight people per team:
Avoid large teams (more than eight people) where possible. If a Data Platform build requires more than eight people, split up the teams to focus on an aspect of the platform (more on scaling teams later on).
It is generally too hard for a large team to keep track of everything at a deep and meaningful level, leading to increased levels of burnout and miscommunications.
Plus, no one likes to be in a 20 person daily standup that takes an hour and 90% of it’s content has little direct relevance to them.
On the other side, teams of one to two people long term can make for lonely experiences and aren’t very resilient if one person goes off sick or gets pulled away to fix organisational emergencies, for example.
Try to split teams across Business Domain and then Products or Projects rather than role types; ideally, you want your team to design, build, and maintain an entire data solution together from start (source data) to end (reports) and not have to hand it off to other teams where information and context get lost.
As the number of teams grow, you may add cross-organizational services like monitoring and malware detection, managed by other teams (usually platforms).
We do note, however, that changing organisation structure is often very hard, if not impossible, in the short and medium term, so if you can’t change your team structure, be aware of it’s limitations and plan appropriately. In fact, changing teams too much is an anti-pattern; it usually takes a few months for a new team to reach maximum efficiency once they learn how to work together in the most effective way.
Communication and Meetings
If you speak to most engineers, they’ll complain that they are into many meetings; however, Project Managers often complain that they don’t get enough updates from engineers!
So there is a gap that must be bridged: making sure engineers are not attending unnecessary meetings, but also making sure engineers are updating progress on at least a daily basis so the project budget holders know we are making progress on a project while spending their money.
For me personally, I try to lean towards overcommunicating. I think the risk of colleagues taking the wrong action because they missed an important update is higher than the risk of colleagues getting annoyed because I’m communicating too much.
Scaling Data Team(s): from one person to many teams
Your first data hire(s) should normally be a Data Analyst or Analytics Engineer: focus on the data outputs or outcomes first. You’ll likely want to bring in project management and business analysis experts, even if they are only part-time.
After you’ve hired a few Analysts, you may want to apply some software engineering and/or manage data at scale, so you’ll likely hire some Data Engineers.
Roughly here is where you think about building data platforms.
Once your team grows beyond approx. eight people, look to split them up into two teams, each with a manager, both likely reporting to a “Head of Data” or similar title.
Once you’ve hired enough people to make more than four to eight teams, you may need to add another layer of management.
Try to split the teams along Data Products (or, next best, projects); the easiest way is to have one team = one Data Product.
Multiple teams on one product increase the chance of handoffs and blockers, which increases blocking. Look at splitting up the product if possible.
On the other hand, if one team is working on multiple products, it increases cognitive load, which leads to more burnout and expensive context switching. Teams are usually fine with two to three simple products and rarely more than one complex product.
As you grow in the number of Data Products, you’ll want to provide them with consistent cross-organisational services such as (Cloud) Platform Engineering, Data Governance, Monitoring, etc. to reduce repetition.
Summary
A reasonably big section, but only touches the surface of effectively running a data team. I haven’t covered:
In person vs. remote working
Comparison of delivery methodologies
Office design
Delivery estimation
Setting personal and team goals (though I have a section on product goals).
Documentation (though I have a section on documenting architectural decisions)
Feel free to reach out in the comments or in a private message on LinkedIn if you have any questions or suggestions!
Finally, here is some research into this topic that has been a major help to me over the years and that I recommend to others:
An Elegant Puzzle: Systems of Engineering Management by Will Larson
Team Topologies by Matthew Skelton and Manuel Pais
What Your Mother Never Told You About Agile Development by Aino Vonge Corry
I’d also like to thank Hannah Varley-Fodden and Lynne O'Donnell for reviewing this post and providing valuable feedback.
Sponsored by The Oakland Group, a full service data consultancy. Download our guide or contact us if you want to find out more about how we build Data Platforms!
Cover Photo by Nick Fewings on Unsplash