Why we're building CipherStash
With data breaches continuing to threaten our privacy and data security, are current data protection schemes enough?
Back in the early days of my career as a developer (circa 2003), data security was never a big consideration. We protected our passwords and installed antivirus software - that was about it. But 2021 is vastly different. The past decade has seen a steady increase in the levels of scrutiny and focus we put on data security.
An increase that was accelerated by COVID-19.
Major data breaches are so common now that many of us take them for granted - it isn't a matter of if they will happen but when and how bad they will be. And in a world where more and more of our data is stored in online services, everything from email addresses to super private healthcare records, this genuinely terrifies me.
How safe is our data?
Thanks to the truly massive breaches affecting millions (and sometimes billions) of users like those of SolarWinds, Equifax and Anthem Health, the safety of data stored in the cloud has been thrown into question.
But what puts data in the cloud at risk? Any answer to that question must start with a discussion of encryption.
Encryption "in transit"
Encryption is an incredibly important part of the modern web. For the most part, it is used to secure your connection to a website. This uses a protocol called Transport Layer Security (TLS, more commonly referred to as SSL or HTTPS) and is arguably the web's most important defense against attacks. Without it, connecting to a website would be a precarious proposition indeed.
TLS is often described as being a form of "encryption in transit" — that is, a way of protecting the data while it is being sent and received from one party to another (for example between the web browser on your computer and the servers hosting your online banking service). TLS prevents an eavesdropper from reading sensitive data as it is being transferred. It also provides a mechanism for the sender to verify the identity of the service they're connected to.
TLS is generally pretty uncontroversial, but one thing to consider when assessing a system for potential security risks is where the TLS connection "terminates". TLS requires additional computation (decryption and encryption of the data), so many cloud providers will offload the TLS workload to an external load balancer to reduce the load on their primary services. However, this means that the final connection between the load balancer and the server is unencrypted.
Encryption "at rest"
Security folk often also talk about the idea of "encryption at rest" — a way to protect data when it is stored on a server, or a user's computer. You very likely have something like this enabled on your laptop or mobile phone. For example, when you unlock your iPhone by entering a pin or showing your face to the front-facing camera, some of the data on your phone is decrypted so you can access it. This process happens without you really noticing, and is a form of "Transparent Data Encryption".
Transparent Data Encryption (abbreviated to TDE) is an important part of securing any data storage system. It protects the data from an adversary who has access to the physical device on which data is stored. However, it only provides protection when the data isn't being accessed.
Consider the case of your mobile phone. If you forget to lock your phone when you've finished using it, the data that was decrypted when you unlocked it will remain unencrypted in the memory of the device. This potentially allows anyone who picks your phone up to read your messages, access your contacts, and so on.
Server side databases — like your iPhone — often have TDE systems to protect the physical storage devices running in a data centre. However, the databases used in cloud applications need to stay "unlocked" all the time. The web is a 24 hour operation, and so databases configured with TDE often gain little security benefit because the data is almost always available in unencrypted form in memory.
Problems with TDE
TDE can play an important role in the data stack and should be considered one of the layers in a defence-in-depth strategy. However, it has several limitations which can lead to very real security risks:
- Once the database starts up, the data is decrypted
- Data is visible to superusers of the system — for example, Database Administrators
- Keys must be available to the database server
This last point is important.
On your iPhone the decryption key is derived from your pincode or biometric data. You can control when and how you provide the key to your phone to access your data.
With TDE on databases this is different: one key must be used to decrypt the data for all clients of the database (or in some cases a specific table). This means the decryption key must be available on the same server as the database itself.
If a hacker can get access to the database, they could also get access to the decryption key thus making the encryption useless.
A more direct approach - "Online encryption"
The limitations of TDE has led many organisations to take a more direct approach to encrypting their data. One technique is to encrypt sensitive data before it is ever sent to the database, using keys that are controlled by the user or client application.
We describe this as "online encryption" (sometimes also called "row level" encryption).
From a security perspective this solves 2 important problems:
- The database never sees any cleartext (unencrypted) data and;
- The decryption keys can be managed separately, thus reducing the risk of a hacker getting access to the data.
But online encryption comes with one horrendous downside. A drawback so severe, that the vast majority of organisations simply don't use it:
Online encryption is not searchable.
What does this mean in practice?
- A healthcare provider stores its patient records in a database using online encryption: admin staff can no longer lookup a patient by their Medicare or Social Security Number.
- A university encrypts student transcripts using online encryption: academic staff can no longer query all students with a GPA of over 3.0.
- A buy-now-pay-later app stores all transactions using online encryption: customer service can no longer search transactions
You get the idea — online encryption is not searchable!
Security or Utility?
When building applications that need to store sensitive data, dev-teams have historically had to make a tough choice.
While TLS and TDE address some basic security concerns (and they should always be considered a base requirement), when it comes to how data is managed in the database itself, the team must decide if they are optimising for security or utility (for example, the ability to search the data) — not both.
While some applications get around this by downloading and decrypting all the data in a client before searching, this isn't practical for large datasets, and comes with some pretty major security drawbacks itself (for example, each client now has a copy of the entire dataset in cleartext).
Searchable Online Encryption
What if online encryption didn't have this drawback?
What if we could have a Searchable Online Encryption scheme for our database?
This is exactly what we set out to achieve with CipherStash.
CipherStash incorporates the latest research from Stanford University to create what we call a Searchable Encrypted Data Store. CipherStash borrows concepts from traditional databases, as well as common search indexes (like Elasticsearch and Lucene).
But its primary feature is that data is encrypted at all times, end-to-end.
Over the coming weeks and months we'll be writing more about how CipherStash works, and how it is a vital component in your security strategy. In the mean-time, if you'd like to stay updated with new articles and announcements, please consider subscribing to our mailing list or following us on Twitter or LinkedIn.
About the Author
Got sensitive data you need to secure?Sign up for free
No credit card required.
Latest PostsView all articles
Convert the User model in your Prisma/Next.js app to CipherStash
In this article we cover how to create a secure, searchable data vault for your users using TypeScript and Next.js, and safely migrate your existing data.
Linting your GitHub Actions
Your infrastructure-as-code is still code, and all that YAML needs to be checked for correctness. So does ours, and we did something about it.
3 security improvements databases can learn from APIs
It turns out there’s heaps we can learn from API security improvements and apply to databases. Here are the top 3!