# How we used the PostgreSQL wire protocol to bring you searchable encryption

*Published on 2025-04-23T00:00:00.000Z*

*By Toby Hede — Principal Engineer*

When using CipherStash Proxy, applications can interact with encrypted data in a Postgres database as if it were plaintext. SQL statements are intercepted, parsed, type-checked, and mapped to the configuration specifying the types of encryption to use. In order to make all of this smart magic work to abstract the complexity of searchable encryption, CipherStash Proxy has to handle the nuances of the PostgreSQL wire protocol.

## Content

CipherStash Proxy provides **searchable** encryption for PostgreSQL without the need to adapt application SQL. CipherStash Proxy automatically encrypts and decrypts data, and supports most query types over encrypted values.

When using CipherStash Proxy, applications can interact with encrypted data in a Postgres database ***as if it were plaintext***. SQL statements are intercepted, parsed, type-checked, and mapped to the configuration specifying the types of encryption to use. You can read about rewriting SQL in part one of our series [Rewriting SQL on the fly in CipherStash Proxy](https://cipherstash.com/blog/rewriting-sql-1).

In order to make all of this smart magic work to abstract the complexity of searchable encryption, CipherStash Proxy has to handle the nuances of the PostgreSQL wire protocol. 

### What is the PostgreSQL wire protocol?

The [PostgreSQL protocol](https://www.postgresql.org/docs/current/protocol.html) describes the communication process between client (applications accessing the database) and server (the database).

As in all things computer, nothing is as simple as it seems; the protocol is actually two protocols:

- Simple
- Extended

Regardless of flavor, a stream of messages is sent over the network. The first byte of a message identifies the message type, and the next four bytes specify the length of the rest of the message. The body of a message depends on the message type (you can find details in the PostgreSQL [Message formats](https://www.postgresql.org/docs/current/protocol-message-formats.html) reference).

The simple query protocol is, as the name suggests, simple. 
SQL statements are sent as a string. This is, in fact, quite simple. The PostgreSQL team can’t be impeached for false advertising here.

```sql
-- first byte is `Q` denoting a Simple Protocol query
Q\0\0\03SELECT encrypted_email FROM users WHERE id = 1\0
```

The SQL string will contain any dynamic variables or data as literal values. Applications will need to build this SQL string, interpolating and safely quoting and escaping any data, and suddenly the simple protocol is not that simple at all. The whole thing is a bad idea in practice and a potential source of SQL injection attacks. 

Injection attacks are the [number 3 vulnerability](https://owasp.org/Top10/A03_2021-Injection/) in the OWASP Top 10 because dynamically building a sql string is not as simple as in practice as it sounds.

Unsurprisingly, the extended protocol is **not** simple, and extends the query process into multiple steps, with the core benefit of separating the SQL statement from parameter data.

In the extended protocol the statement execution process is broken into a series of steps, and multiple messages are sent by the client: 

- `Parse` - sends a (parameterized) SQL statement and an optional list of parameter data types
- `Describe` - requests a description of the parameters and rows returned
- `Bind` - sends values to be mapped to the statement parameters
- `Execute` - executes the statement with the bound parameters values

In general, any application using parameterized statements is using the extended protocol under the covers. 

### What about prepared statements?

This is where things become nuanced.

All parameterized statements use the extended protocol, but not all parameterised statements are `prepared`.

A prepared statement adds a client-specified `name` to the `Parse` step. A `named` statement lives for the length of the database session (as long as the client keeps the connection).

Once a named statement exists, the client can refer to the statement by `name` and skip the `Parse` step. 

Fun fact: all statements using the extended protocol actually have a name, but if the client does not supply a name to make it an “officially” prepared statement the name is empty `“”` and the statement is reset on every `Parse`. 

The performance benefit of prepared statements is difficult to quantify. The SQL string is only transmitted once, so there is less network overhead. Query planning typically occurs on `Bind` as  parameter values may impact the query plan, so internal caching of statements in the database is not guaranteed. 

### Where was I?

Anyway, all of this is very complex and I hate it. 

Let’s have a look at how transparently securing data interacts with the protocol.

## How CipherStash Proxy uses the PostgreSQL protocol

Encrypting a value is generally relatively simple. Input data is encrypted using a service like CipherStash ZeroKMS or AWS KMS, and stored in the database. 

Searchable encryption is slightly more complicated. Searchable encryption works by encrypting the same input value in several different ways, based on the types of search required, not just the value itself. 

For example, an `email` might be configured to enable a `unique` index on the encrypted data, as well as partial `match` search operations (eg finding all emails by an encrypted `name` input ). 

An encrypted column is defined in the database as PostgreSQL type called `encrypted` (part of the CipherStash [Encrypt Query Language](https://github.com/cipherstash/encrypt-query-language) and bundled with proxy).  The type enables proxy to discover `encrypted` columns by interrogating the database schema. The type is a wrapper around PostgreSQL’s native `jsonb` type, allowing us to store many encrypted terms in a single column.

Additional configuration defines the source data type and the types of searchable index terms to generate on write. The source data type is required so that a decrypted value can be returned as the original type.  

This all sounds complicated, but is simpler in practice. 

Assume that we want to encrypt an `email` in the `users` table. 

We define:

- an `encrypted` column named `email_encrypted`
- with a cast type of `text`
- and `match` index term.

Given a SQL statement: 

```sql
INSERT INTO users (email_encrypted) VALUES ($1);
```

The client application passes the `$1` parameter as a regular `string` 

CipherStash Proxy uses the database schema to identify `email_encrypted` as an encrypted column and generates both the encrypted value and the `match` index term.

To the application, the column and parameter are a string, but internally, the data is encrypted. 

Totally simple.

The steps for transparently encrypting and decrypting data map neatly onto the extended protocol:

1. Parse and type check sql to identify encrypted columns
2. Encrypt all input data destined for an encrypted column
3. Execute the SQL
4. Decrypt all returned data sourced from an encrypted column

The same steps apply to queries using the simple protocol, but it is much less interesting. 

So what happens behind the scenes?

### Parse

As we saw, the `parse` messages contain the SQL statement. 

Proxy parses (surprise) the SQL statement and type-checks the statement against the database schema. Parameters and literals that require encryption are identified and mapped to their encrypt configuration. Encrypted columns in the “projection” (the returned list of columns in a `SELECT` statement) are also identified. Returned data needs to be decrypted.

If the statement contains literals that need encryption (because the data's sensitive), the encryption happens in this step. (But also, it's probably best not to use literals.) 

A `parse` message can contain an optional list of data types, which is where things get a bit tricky. What does this actually mean?

Assume we have a users table with an `int4` column  appropriate named `int4_column` 

Given a parameterized statement:

```sql
INSERT INTO users (int4_column) VALUES ($1);
```

The expected type of the `$1` parameter will be an `int4`.

However, the protocol enables a client to specify a different type for this parameter in the `Parse` message. The specified type needs to follow PostgreSQL conversion rules, and has to be something that PostgreSQL can convert to the target column type.  So, for example, the client might specify that it will actually send an `int2` instead of an `int4` . The conversion actually happens in `Bind` and if the column is encrypted, CipherStash Proxy needs to handles this conversion. 

### Describe

A `describe` message requests a description of the parameters and rows returned (if any) of a SQL statement. 

Describe is where the magic really begins. 
And by magic I mean strategic misrepresentation. 

Going back to our earlier `users` table with an `encrypted` column named `email_encrypted` configured as an encrypted `text` .

Given the SQL statement:

```sql
INSERT INTO users (email_encrypted) VALUES ($1);
```

The actual type of this column is our `encrypted` type containing `jsonb`.

On `Describe` CipherStash Proxy reports the parameter type as the cast `text` type. To the client the column is `text` and the email value can be sent as a `text` parameter.

A similar process happens with the types of columns in any rows returned. Any encrypted columns are described as the cast type and not the actual column type. 

### Bind

A `Bind` message passes any required statement parameters in preparation for execution. 

Bind is where encryption happens. 

If a parameter references an encrypted column, the data is encrypted, transformed into the`encrypted` type representation, and passed through to the database to be stored as `jsonb`.

Of course, reality is slightly more complicated. We’ve already seen that the protocol enables a different type to be set in `Parse`  (for example sending an`int2` instead of an `int4`) and we need to handle this conversion. For various reasons, the extended protocol also has two different encoding formats (`text` and `binary`). Parameters need to be decoded using the appropriate approach before being encrypted.

### Execute

Once we have a parsed statement and bound any parameters, the statement can be executed. 

The database takes the sql and parameters and does the work. From the PostgreSQL perspective this is a totally vanilla sql statement. An `encrypted` column expects `jsonb`, CipherStash Proxy ensures that the encrypted data is  `jsonb` , and things just work (tm).

```sql
INSERT INTO users (email_encrypted) VALUES ('{...}');
```

More work is required for statements that return data. 

The response to a `SELECT` statement is a `RowDescription` message followed by zero or more `DataRow` messages, each representing a single returned record. Proxy decrypts the data encodes them into the expected type. 

Going back to our earlier example schema:

```sql
SELECT email_encrypted FROM users;
```

The `email_encrypted` data is decrypted back into the original source `text` and passed through. 

From the client perspective, the fact that `email_encrypted` is `jsonb` containing one or more index terms is completely hidden. The column is `text` at read time, but encrypted and secure inside the database.

## What next?

As you can see, CipherStash Proxy uses statement type checking, statement mapping, and the PostgreSQL protocol to provide **searchable** encryption for PostgreSQL without you needing to adapt any SQL in your application.  

Try out [CipherStash Proxy](https://github.com/cipherstash/proxy?tab=readme-ov-file#getting-started) yourself, or [get in touch](https://cipherstash.com/contact) to find out more. We love to nerd out about protecting data with encryption.

## Related blog posts

- [Rewriting SQL on the fly in CipherStash Proxy – part 1](https://cipherstash.com/blog/rewriting-sql-1.md) — CipherStash Proxy helps teams who need advanced data security controls but don't have the capability or capacity to make changes to their apps. With CipherStash Proxy, developers, devops engineers, and SREs can encrypt their most sensitive data in Postgres databases with zero changes to SQL queries in their apps.
- [And now… CipherStash Proxy!](https://cipherstash.com/blog/introducing-proxy.md) — CipherStash Proxy keeps your sensitive data in PostgreSQL encrypted and searchable, without changing your SQL queries. This means that you can protect your most sensitive data with strong security controls, without slowing down your dev team.

