How to Build a Data Platform: Security
Trade-offs, Defence in Depth, Security Monitoring and Security at Scale
Note: This is part of an in-progress guide on “How to Build a Data Platform”, subscribe for future updates!
Data Security is a Trade-off between Risk and Cost
I don’t need to tell you how important security is to a Data Platform: large data breaches hit the headlines almost daily, which can have massive monetary, legal, and reputational costs.
However, it costs money to enforce data security: expensive cybersecurity tooling, “SSO tax”, hiring security experts, and added security processes that can slow down innovation if not designed correctly.
So it’s a tradeoff between how much risk of a data breach you're willing to bear vs. how much of your product or platform budget should you spend on security?
So how do we know what is “good enough“ security? Fortunately, there are well-known security recommendations and guidelines to aim for, such as NIST (US Gov), CAF (UK Gov), CIS, Mitre Att@ck, ISO 27001 and OWASP, not to mention that every major vendor will have a guide on security best practises for their product.
There is also one security strategy that is repeatedly used in the industry for secure software: defence in depth.
Defence in Depth
Just like a multi-layer backup strategy guards against data loss, a defense-in-depth strategy guards against data breaches by having multiple layers of security, so if one layer fails, you still have other layers to secure your data and minimise the blast radius of any breach.
We’ll go through each of the layers in turn:
1. People and Policy
People are the number one reason why data breaches happen: they accidentally send data to the wrong place or store sensitive details like passwords or personal information in a public place.
This is why security training is usually a mandatory first-day exercise for many organisations.
You also want to foster a culture of proactive security, where software teams are always on the lookout for any possible security issues so their software is as secure as possible.
2. Physical
I’m not an expert on physical security, so I can’t talk about how to make your data centre bombproof; therefore, I won’t comment much on this section. Typical recommendations here are to always keep devices that have access to sensitive data, like laptops and phones, close to you or in a secure location.
3. Perimeter
You can think of all organisations as having a “security perimeter”, where you have to login and have the correct credentials to get inside the perimeter.
The two main ways to setup a perimeter are Virtual Private Network (VPN) and Zero Trust Architectures (ZTA), with ZTA being the newer, generally preferred approach, especially if you are using the cloud. Sometimes both are used!
In the cloud, there are lots of organisation-wide security features for:
Plus a half-dozen other options available.
4. Network
I think most engineers can get away with just knowing the basics of the first three layers, but this and the latter layers are where you want to get to know the best practises, as these are the layers engineers are going to be building, interacting with, and enforcing.
The practise of networking is about connecting compute and storage together, so best practise dictates trying to give each other access to as few services as possible.
In reality, this can be onerous to enforce, so there can be a trade-off here: for example, an organisation-wide service could give access to all internal services, even if they're not used by all services.
Other major aspects of network security are:
Firewalls to block traffic to malicious websites and reduce the attack surface. Many internal firewalls also block all inbound connections, so you can’t connect to most or any data or compute from outside the network aside from a few monitored endpoints.
It’s common to have multiple firewalls for redundancy and reducing the attack surface: organisational firewalls for all traffic, internal firewalls, application firewalls, and server (Operating System) firewalls.
Private Networks (AWS, Azure) create internal perimeters for each application or set of sub-components of an application, each with their own firewalls.
Encryption in Transit: encrypting traffic between servers reduces person in the middle attacks. The most common encryption used in transit is HTTPS.
Private Network Endpoints (AWS, Azure) keep all traffic internal, so no traffic goes through the public internet, which again reduces person-in-the-middle attacks.
5. Host / Operating System
While in the cloud, you may often use managed or serverless services, so you don’t have to worry about configuring the server's Operating System (OS) according to security best practises.
If you do need a server, one option is to use preconfigured hardened servers and containers in the cloud marketplace, such as CIS server images saving you a lot of hard work, though not everything, as you’ll want to have the minimum required access as well. For container security, I’d recommend reading docker’s security documentation. For servers, security is such a big area that you’ll likely end up reading multiple books or entire websites on the topic.
You may also want to install anti-virus software on the server, especially if users are going to be actively logging in to the application.
You also should think about a server patching policy, which can actually be very painful to setup as you need to be careful not to push software updates that break your software (note, automated testing is great here), but also make sure you push updates often enough so you don’t have software with vulnerabilities.
6. Application
Most applications will have some kind of authentication, though it is increasingly likely that applications are linked to a single organisation-wide authentication service to enforce Single Sign-On (SSO). Third party vendors generally charge extra for enterprise SSO.
Limiting what you can access once you are logged in is usually done through Role-Based Access Control (RBAC). Here, you assign everyone one of many roles that each have different levels of access: typical roles are read-only, user, and admin.
It is also worthwhile scanning all code for vulnerabilities on each change as part of your DataOps pipeline. Another advantage of a fully automated DataOps pipeline is that you can limit access to sensitive areas, such as production environments, to only a few administrators rather than every developer.
7. Data
You can use your own data to limit access to other parts of the data using Column and Row Level Security, which is also known as Attribute Based Access Control (ABAC). This works by having a list of users with some user attributes, and then attributes are looked up to check for access. Some data storage and processing vendors come with managed features built in, usually at a premium. There are also open-source initiatives like Open Policy Agent (OPA) to build a cross application RBAC and ABAC authorization language.
You’ll also want to add Data Encryption at Rest, which encrypts data when it's stored in data storage. Most major cloud providers and data vendors provide encryption at rest, though many also charge a premium for it.
Finally, Data Anonymization or Data Masking: some vendors provide this, but you can also do this yourself, though it requires some maintenance overhead. One issue with this is that it can make some analytics difficult or impossible, and you may have to pay extra for it. It is also worth knowing the difference between anonymization and pseudonymization.
Security Monitoring
Monitoring and alerting are another big element of security. Alerting can tell you of a breach the moment it happens, and if there is a breach, you want to know who has access to what for legal purposes.
One issue with extensive monitoring is cost at scale, as all these logs will cost money to store and sometimes process. You need to think carefully about what long-term log data will be useful for, so you are not paying any unnecessary costs.
Also, monitoring data is itself a security risk, as it logs network and server details that can be used against you if leaked. Because of this, monitoring is usually centralised, and raw log data is often only accessible by a few administrators and developers.
Security at Scale
Building and managing all these layers can be quite difficult and expensive for one data team, as well as distracting them from their original mission of providing value to the organisation through data.
You can help ease the burden by providing a number of security services for data teams to adopt, usually in the first three to four layers, such as SSO, organisation wide cloud security policy and Network Firewalls.
Also, providing common architectural patterns can be a massive help here, as they can and should include best security practises. There are some overheads in keeping them up to date, though, and there will always be edge cases that don’t fit existing patterns.
Federation
As you scale, the security team will know less about the context of the data team(s) and can easily get overwhelmed by the number of requests it receives for permission changes. This is where it can make more sense to move permission management to the product or project owners, so they can manage access.
This can save time on getting the required permissions, put less strain on the security team, and provide the best security as the owners know their applications the best.
This does have the downside of requiring the right tooling to do this, though many large cloud providers allow for permission groups that can be owned by people who are not admins elsewhere.
Summary
Thanks for reading! My takeaways are:
Security is a tradeoff between risk, usability, and cost.
Defence in depth is a great strategy for security.
Monitoring is very useful for security but also poses a security risk if configured incorrectly.
Federated Security is very useful at scale.
This was a vendor-neutral article; if you’re interested in Azure Data Platform security, I also recommend my colleague Abigail Brown’s article.
Thanks to Phil Bent for reviewing this article. Photo by Jason Dent on Unsplash
If you have anything to add or any questions, feel free to comment below!
Sponsored by The Oakland Group, a full-service data consultancy. Download our guide or contact us if you want to find out more about how we build Data Platforms!