3 security improvements databases can learn from APIs

It turns out there’s heaps we can learn from API security improvements and apply to databases. Here are the top 3!

Over the last decade, techniques for building secure APIs have advanced quite a bit. But database security has changed very little.

This has severe consequences for organisations.

According to IBM’s Cost of a Data Breach Report 2021, the global average per-incident cost of a data breach was $4.24m USD. This was a 10% increase in average total cost of breach between 2020–2021.

The landscape organisations operate in is changing. Compliance requirements (e.g. GDPR, CCPA) are becoming more stringent. Ransomware cost $20B+ globally in 2020. Attackers are using more sophisticated techniques, and are moving faster.

The worst-case example of this in 2020 was a disastrous data breach at mental health startup Vastaamo. Over 300,000 patient records — including detailed consult notes — were leaked and used to extort users.

The techniques used by attackers are nothing special. You are likely familiar with them:

Compromised credentials, where attackers use stolen credentials to gain access to a target.
Cloud misconfiguration, where attackers use default, unused, or untested configuration to expose information.
Injection attacks, where attackers construct malicious requests to extract or tamper with data.
Adversary-in-the-Middle attacks, where attackers view or manipulate data in transit.
Denial of Service attacks, where attackers make a service unavailable for legitimate users. This is sometimes used as a cover for remote code execution and data exfiltration.

We usually think about these attacks taking place between our systems and our users. But these techniques apply at all levels of the stack.

It turns out there’s heaps we can learn from API security improvements and apply to databases.

Here are the top 3:

1. Standardised serialisation formats

Standardised serialisation formats create strongly typed communication for network transport, and data storage. They reduce attack surface, to mitigate attacks like SQL injection.

The three most well-known examples of this class of technology are Protocol Buffers, BSON, and Apache Avro. They allow developers to represent data structures in compact binary forms. Plus, they do heavy lifting for developers on encoding and decoding that data.

How do we apply this to databases?

We can replace the messaging format for handling input and output in the database.

Frankly, this isn’t feasible for a lot of existing databases. New databases like CipherStash are able to take this approach from the start.

These standardised serialisation formats help us build secure clients faster. They can help us generate clients for different languages. Often with backwards compatibility baked in. They can also help us generate documentation.

These help us defend databases against a broad class of deserialisation attacks like:

Injection — data injection, by only supporting primitive data types.
Privilege escalation — gaining remote code execution through object deserialisation. They also help us defend against Denial of Service attacks like:
Resource exhaustion — our databases can drop and log bad deserialisations.

Serialisation formats like protobufs and BSON aren’t a silver bullet though. Consider them one layer of defence that works in concert with other layers. Strongly typed, memory safe languages are another layer.

One example of the logical conclusion of this approach is ProfaneDB. You define your models in protobufs, talk to the database with gRPC, and it stores data on disk with RocksDB.

Speaking of gRPC, that brings us to the next major security advancement: RPC.

2. RPC

If you’ve been in the tech industry a while, you’ve likely run into HTTP-based RPC technologies like SOAP, and XMLRPC. Or even older RPC technologies like CORBA or DCOM. Over the last decade and a half, REST and GraphQL have superseded all these.

Deeper in the stack, databases have traditionally have either:

Had unique wire protocols, or
Used a generic transport like HTTP.

Now, there are a variety of modern RPC frameworks that use code generation to handle routes, serialisation, headers, and errors.

Two of the most popular modern RPC frameworks are gRPC from Google, and Twirp from Twitch.

How do we apply this to databases?

We get some big security wins by replacing existing network transports in our databases with modern RPC tools:

We can ensure protocol compatibility between client and server. In extreme cases, we can even force clients to upgrade to latest versions.
We can reduce attack surface to only what the endpoint explicitly exposes. This can stop a lot of enumeration attacks in their tracks.
We can limit the impact of denial of service attacks through strict deserialisation. We can also log those deserialisation failures.
We reduce the likelihood of security bugs by using tools and protocols that are broadly used.

CipherStash is built from the ground up using this approach. It uses gRPC and protobufs for all client/server communication.

Again, modern RPC frameworks aren’t a silver bullet. They can even offer attackers new, exciting ways to exploit your systems.

In particular, watch for gRPC Server Reflection.

This is an awesome dev feature that lets you enumerate gRPC endpoints in a human readable format. gRPC methods, arguments, and protobuf descriptions are great intel for attackers.

Make sure you disable gRPC Server Reflection in your production builds!

3. Auth

Traditionally, the tech stack we use dictates how we do authentication and authorisation.

Challenge–Response authentication is by far the most widely adopted approach. Some systems have taken less-traveled paths like Client Certificate Authentication, or the Secure Remote Password protocol.

This has changed significantly in the last decade. OAuth 2.0 has become the de facto standard for API authorisation, built atop cloud-based identity providers.

This offers one of the biggest security improvements for databases:

We no longer store credentials in the database, which enables distributed trust.
We avoid the need for long-lived credential, like expiring JWTs. This reduces attack scope, due to time bounds.
We can scope permissions granted to an individual right down to the data layer. This eliminates shared credentials, and helps achieve the principle of least privilege.
We can use existing sources of identity within our organisations, which reduces duplication.

How do we apply this to databases?

We can jettison our home-rolled auth, and rely on third party identity providers. This moves us to a model where you have untrusted clients, but trusted servers. Clients authenticate to an IDP, the IDP sets up a session with the database, and the database gets to remain ignorant of users.

This means less code, and lower ongoing costs. Most importantly, the database access integrates with broader organisational IAM controls. This streamlines the process of user on- and off-boarding, and managing user permissions.

This limits the impact of compromised credentials and account takeovers. Compromised credentials contributed to 20% of all data breaches in 2020.

A surprising number of databases take advantage of this security advancement. MongoDB, OpenSearch, and CouchDB all support JWT authentication. In case you haven’t already guessed, CipherStash also uses this approach for guarding access.

4. Bonus: TLS

In the before times, certs were costly, so teams economised by not using TLS everywhere. This looked like TLS termination at your load balancers, and unencrypted traffic onwards.

Because of the lack of TLS everywhere, automation for managing the cert lifecycle wasn't great. Separately, there was also poor visibility into certificate supply chain.

Let’s Encrypt emerged in late 2014, and shook up the certificate landscape for the better. Certificates now are basically free. This has lead to a proliferation of end-to-end TLS.

Let’s Encrypt showed that a better dev and operator experience for the certificate lifecycle was possible. This changed developer attitudes towards TLS. It created a demand for a better local dev experience with TLS. Tools like mkcert filled that gap, and have seen extraordinary uptake.

Finally, Certificate Transparency has created supply chain visibility. This visibility has called attention to existing flaws. This has accelerated work on preventing and monitoring bad actors issuing bad certificates.

How do we apply this to databases?

Given the proliferation of cheap certs, there’s no excuse to not terminate TLS in the database itself.

Now PKI automation is mature, we can handle the cert lifecycle in the database server itself.

With much of the cert lifecycle handled for us, we can focus on more sophisticated defences.

For example, defending against replay attacks by strictly using Forward Secrecy ciphers. This requires disabling TLS < 1.3. Lower TLS versions can downgrade connections to less secure cipher suites. This exposes users and their data to replay attacks.

But as we close one door on attackers, another door opens. One non-obvious downside of Certificate Transparency logs is easier passive asset discovery.

1. Standardised serialisation formats

How do we apply this to databases?

2. RPC

How do we apply this to databases?

3. Auth

How do we apply this to databases?

4. Bonus: TLS

How do we apply this to databases?

Start protecting your data