Issue #27: What Should We "Shift Left" On?
Plus: The State of Databases Today, Serverless is Still Not Designed for Data, Why Prices are Still Going Up, Apache Kafka as System of Record and Why hasn't Streaming Overtaken Batch?
Hi all, this week we have:
What Should We "Shift Left" On?
Podcast: The State of Databases Today
Serverless is Still Not Designed for Data
Why Prices are Still Going Up When Companies are Spending Less?
Ditching Databases for Apache Kafka as System of Record
They Said Streaming Would Overtake Batch.
What Should We "Shift Left" On?
There is a lot of talk of “shifting left” on a number of aspects of a data platform or product. But what is shifting left? Simply put, it is building a feature at the start of a project's or product's development that is normally built late in development. And if you want examples, I have some below:
Shifting left on testing has been around since at least 2001 and has it’s own Wikipedia page.
Many data quality vendors talk about how they can help you shift left.
And while I couldn’t find any data modelling articles on shifting left, there has been a lot of complaining about the possible trend towards less data modelling, especially lack of conceptual modelling, which is also typically done at the start of development.
Also security, though if you’re putting sensitive data into your MVP, then I’d argue security should built in from the start.
So should we build our Minimum Viable Product (MVP) with testing, data modelling, data governance, DataOps and data quality built in on the first iteration so we can “shift left“? That’s a lot to cram into an MVP even if you buy a bunch of expensive off-the-shelf products! Especially if you have a deadline of, say, six weeks to prove business value.
Speaking of business value, that should be your number one goal; if you can’t do that, then you are unlikely to get an extension to your MVP. But proving value can be tricky, especially with more experimental products where we are not even sure if the product can be built.
But focusing on business value means that shifting left quickly becomes shifting right, leaving you with a potential mountain of tech debt that could take months to clear, reduce trust in customers (remember, no data quality), and slow down the development of new features.
So what do we do? There are no silver bullets, but a few things can help:
Testing, DataOps, governance, quality, and modelling are not on-off switches; you could pencil in just enough of them to keep technical debt to a reasonable level while not turning each of them into a multi-month project in their own right. The downside to this is that it can lead to a lot of burnout-inducing context switching and work in progress.
If you’re not sure the budget holders will give you extra budget for “shifting left”, present it as a series of options in the project proposal while also mentioning the tradeoffs described above for shifting left or right.
Hire more people and break up work into parallel workstreams. This can work if you are pretty sure your product will add value, but rarely does doubling the number of people mean you double the production; in fact, it can slow down production.
Buy off-the-shelf software that suits your exact needs (or near enough). The downside of very specialist software is that it’s likely to be less adaptable in the future as the business and requirements change. Also, more expensive licence costs.
If you are building the same feature again and again for projects, convert it into a service or architectural pattern so future projects save time. For example, a well-tested monitoring service or cloud infrastructure module. This won’t help you with your next MVP, though.
I’m tempted to say DataOps should come before the other items I mentioned above, as improving DataOps improves the efficiency of new features. Data Quality is another strong contender, as poor quality can lead to a loss of trust among stakeholders that is hard to win back. In reality, this is not a hard rule and depends on the use cases.
As a consultant, I can’t help but mention that you can hire a team of experts to help kickstart the development process, especially as many of them have pre-built accelerators. I also have firsthand experience that this is no silver bullet; we cannot always turn water into wine, and it’s more expensive than in-house staff.
If you have any other ideas on how to shift left under difficult time constraints, leave a comment below!
Podcast: The State of Databases Today
This is an interview with Andy Pavlo, CEO of OtterTune and “Professor of Databaseology” at Carnegie Mellon.
Andy Pavlo does great articles reviewing databases each year, so it was great to hear him in audio form talking about relational databases vs. other types and the future of the database market, among other topics.
Serverless is Still Not Designed for Data
I’d personally rename this to “Microservices is Still Not Designed for Data“ as there are serverless databases available everywhere now, but the article is still an interesting read from Ciro Greco and Jacopo Tagliabue of Bauplan, as it is curious how Microservices have become a big thing on the operational side and not the analytical side.
Why Prices are Still Going Up When Companies are Spending Less?
Analysis from Jason Quinn from Vendr, who helps companies buy SaaS software.
Ditching Databases for Apache Kafka as System of Record
Andreas Evers, CTO of KOR Financial, pitches an idea that many may see as crazy, as Kafka is mostly used to get lots of data from a to b very quickly.
They Said Streaming Would Overtake Batch.
A very good comparison of streaming vs. batch and why streaming hasn’t taken over Data Engineering by Data Engineer Daniel Beach.
Sponsored by The Oakland Group, a full-service data consultancy. Download our guide or contact us if you want to find out more about how we build Data Platforms!
Cover Photo by Nick Fewings on Unsplash