Rewriting SQL on the fly in CipherStash Proxy

CipherStash Proxy helps teams who need advanced data security controls but don't have the capability or capacity to make changes to their apps. With CipherStash Proxy, developers, devops engineers, and SREs can encrypt their most sensitive data in Postgres databases with zero changes to SQL queries in their apps.

We've been on a long and circuitous journey to build CipherStash Proxy over the last 18 months, and we’ve learned a lot along the way.

In this first of two posts we're going to share what we learned on that journey, with a look at how CipherStash Proxy safely and correctly rewrites SQL statements using a technique we’ve termed “statement mapping”. In part 2 we’ll deep dive into how we built a SQL type checker, a core component of our statement mapping implementation.

What problem are we solving?

With CipherStash Proxy, your applications can interact with encrypted data in a Postgres database as if it were plaintext. Proxy rewrites inbound SQL — encrypting values and SQL params as needed, and rewriting WHERE clauses to use our searchable encryption — and decrypts query results before sending them back to your application.

Encrypting data in databases is an old idea, but encrypted values are not queryable. The Big Idea™️ with CipherStash Proxy is that by presenting an encrypted database as if it were plaintext you can have both excellent data security and the same great UX as a regular plaintext database.

Our searchable encryption supports encrypted values in many contexts: ORDER BY, GROUP BY, HAVING, aggregate functions like MIN and MAX, LIKE, ILIKE, text search operators (@>, <@), JSON(B) operators, and exact matches with =.

This functionality is provided by Encrypt Query Language (EQL), which is a collection of Postgres types, functions, and operators for working with encrypted data. Querying over encrypted data does not require access to any secrets. A core part of CipherStash Proxy’s security model is that the database never sees encryption keys, and every encryption and decryption operation is logged for auditing.

Underlying CipherStash Proxy is ZeroKMS, a high-performance key server offering key generation throughput an order of magnitude faster than AWS KMS, and efficient bulk encryption.

To safely rewrite SQL, CipherStash Proxy uses a type-inferencer to determine the types of all expressions before rewriting. It relies on type unification against a live database schema and encryption configuration. The inferencer covers nearly all of the Postgres SQL grammar for CRUD operations (SELECT, INSERT, UPDATE, DELETE, MERGE), with only a few exceptions.

What is statement mapping?

Statement mapping converts a SQL statement written against a virtual schema—where nothing is encrypted—to a SQL statement against the real schema, where some columns are encrypted. It also identifies how parameters (e.g., $1) and literals should be encrypted.

The virtual and real schemas have the same tables and a 1:1 correspondence between columns, but the real schema may contain encrypted data. A successfully mapped SQL statement is considered sound and semantically correct—producing identical results to the inbound SQL on a database without encryption.

Why is statement mapping the solution?

Our primary motivation for Statement Mapping is to provide a seamless developer experience—you shouldn’t have to rewrite any of your application code in order to use CipherStash Proxy.

Working at the SQL level solves the challenges of direct integration for all database clients and ORMs simultaneously. It’s a developer experience win for our customers and a bandwidth win for our team at CipherStash!

A basic example of some SQL before and after statement mapping

Consider a patients table with columns:

id
name (text, encrypted)
phone_number (text, unique, encrypted)

And a prescriptions table with columns:

id
patient_id
medication (text, encrypted)
issued_at (date, encrypted)
repeats_remaining (numeric, encrypted)

Inbound SQL and parameter before statement mapping:

SELECT p.name, p.phone_number, pr.medication, pr.issued_at, pr.repeats_remaining
FROM patients AS p
INNER JOIN prescriptions AS pr ON (p.id = pr.patient_id)
WHERE pr.medication = $1 AND pr.repeats_remaining < 2
ORDER BY pr.repeats_remaining DESC, pr.medication, p.name

-- SQL params:
--   $1 = 'Dextroamphetamine'

After statement mapping:

SELECT p.name, p.phone_number, pr.medication, pr.issued_at, pr.repeats_remaining
FROM patients AS p
INNER JOIN prescriptions AS pr ON (p.id = pr.patient_id)
WHERE pr.medication = $1 AND pr.repeats_remaining < '{ .. encrypted payload .. }'
ORDER BY cs_ore_64_8_v1(pr.repeats_remaining DESC),
         cs_ore_64_8_v1(pr.medication),
         cs_ore_64_8_v1(p.name)

-- SQL params:
--   $1 = '{ .. encrypted payload .. }'

In addition to encrypting a literal and a SQL parameter, some function calls to our EQL Postgres have been inserted. These are required to make searchable encryption do its magic!

What we learned along the way

The statement mapping in CipherStash Proxy is Version 2. Our first attempt never made it to production due to some fundamental flaws:

It lacked a formal theory of operation.
It only knew about encrypted columns, not about the entire database schema.
It didn't completely traverse the SQL Abstract Syntax Tree (AST).
The relationship between virtual and real schemas was overly complex, with multiple columns used to store encrypted data.

The approach relied on heuristics, leading to subtle but critical bugs that had the potential to cause data loss 💥, and potentially expose data that should have been encrypted 😳.

Once we realised this, we abandoned our previous approach, and established some key requirements for our statement mapping solution, as described below.

Mapped SQL statements must be sound

Soundness ensures the transformed SQL statement is well-behaved, meaning:

Data is never read from or written to the wrong table-column.
Encrypted data is correctly encrypted/indexed.
Literals in expressions with encrypted columns are encrypted as needed.
All parameters are validated and encrypted.
Unsupported SQL features are rejected.

The following SQL features are not yet supported by our statement mapper, and are on our roadmap:

fully qualified identifiers (i.e. with schema prefix)
expression syntax (IN UNNEST, LIKE any(..), ILIKE any(..), tuples)
PREPARE statement (surprisingly not necessary with Postgres’s extended protocol used by most ORMs)
MERGE statement
In FROM clause:
- table-valued functions
- UNNEST, PIVOT, UNPIVOT
In GROUP BY clause: GROUPING SETS

Mapped SQL statements must be semantically correct

A mapped query on a database with encrypted columns must produce identical results as on an the pre-mapped version of the query on an unencrypted version of the same database.

Minor behavioural differences for queries, like false positives from our text search implementation based on encrypted Bloom Filters, are acceptable if well-documented and tuneable.

Warning: math notation incoming! 🫣

Formally, given the following definitions:

$S$ - a snapshot of a database prior to encrypting
$S_{enc}$ the same snapshot after encrypting
$Q$ - a SQL statement that behaves correctly when executed against $S$
$Q_{mapped}$ - the result of applying statement mapping to $Q$
$R$ - the result of executing $Q$ on $S$
$R_{mapped}$ - the result of executing $Q_{mapped}$ on $S_{enc}$

Then in general we can say: $decrypt(R_{mapped}) = R$

Statement mapping must be schema-aware

Soundness and correctness depend on accurate, up-to-date awareness of the underlying database schema , even within transactions.

Statement mapping must be performant

Statement Mapping must introduce minimal overhead. Users shouldn’t notice any slowdown.

Broad coverage of the SQL CRUD grammar is necessary

Handling CRUD operations (SELECT, INSERT, UPDATE, DELETE, MERGE) requires near-total SQL grammar coverage. Statements containing unsupported grammar must be explicitly rejected to ensure soundness and semantic correctness guarantees can be upheld.

We aim to support as many SQL features as possible — and to keep expanding that support — so you can be confident that CipherStash Proxy can handle your specific workload.

What next?

CipherStash Proxy’s statement mapping enables seamless, secure database access. The formal approach ensures soundness and semantic correctness, paving the way for scalable, transparent, and searchable encryption.

Stay tuned for part 2, where we’ll delve deeper into type inference and how we handle complex SQL statements.

In the meantime, try out CipherStash Proxy yourself or have a chat with us — we love to nerd out about protecting data with encryption.

Rewriting SQL on the fly in CipherStash Proxy – part 1