Guides

Dataset guide

A dataset is a collection of data that you want to encrypt. It defines the structure of the data and how it should be encrypted. You can create as many datasets as you need to encrypt all of your sensitive data.

Important: If you are using EQL and CipherStash Proxy, the dataset configuration is stored in the database and is not stored in the CipherStash Cloud.

Creating a dataset

First you need to understand the structure of your data. For example, if you have a users table in your database, you might have the following columns:

  • id
  • name
  • email

We suggest starting with a single table and encrypting a few columns to get a feel for how CipherStash Proxy works. Once you're comfortable with CipherStash Proxy, you can encrypt more columns and tables.

Initializing a dataset

To initialize a dataset, run the following command in the CipherStash CLI. If you don't have the CLI installed, follow the installation guide.

1stash datasets create users --description "UAT: users"

Defining the dataset

Next, decide which columns you want to encrypt. In this example, we'll encrypt the name and email columns. We'll leave the id column unencrypted as it's not considered sensitive data. To understand what data you should choose to encrypt, read what data should I encrypt?.

Now that you know which columns to encrypt, define the dataset. Do this by creating a YAML file that describes the structure of the data. In this example, we'll create a file called dataset.yml with the following contents:

1tables:
2  - path: users
3    fields:
4      - name: name
5        in_place: false
6        cast_type: utf8-str
7        mode: plaintext-duplicate
8        indexes:
9          - version: 1
10            kind: match
11            tokenizer:
12              kind: ngram
13              token_length: 3
14            token_filters:
15              - kind: downcase
16            k: 6
17            m: 2048
18            include_original: true
19          - version: 1
20            kind: ore
21          - version: 1
22            kind: unique
23      - name: email
24        in_place: false
25        cast_type: utf8-str
26        mode: plaintext-duplicate
27        indexes:
28          - version: 1
29            kind: match
30            tokenizer:
31              kind: ngram
32              token_length: 3
33            token_filters:
34              - kind: downcase
35            k: 6
36            m: 2048
37            include_original: true
38          - version: 1
39            kind: ore
40          - version: 1
41            kind: unique

Settings to use

If you're only getting started with CipherStash Proxy, only focus on the tables and columns you want to encrypt. You can ignore the other advanced settings until you are comfortable with CipherStash Proxy, as the defaults will work for all use cases. These are the settings to define:

OptionDescriptionExample
tablesList of tables to encrypt
tables.pathName of the table to encryptusers
tables.fieldsList of fields to encrypt
tables.fields.nameName of the field to encryptname
tables.fields.cast_typeType of data stored in the columnutf8-str
tables.fields.modeEncryption modeplaintext-duplicate

The settings above for the other configurations can be used for all tables and fields, specifically the indexes definition. To learn more about the other settings, refer to dataset configurations.

Uploading a dataset

Use the CipherStash CLI to upload a dataset configuration to your account. Note you will need to have created a client key before you can upload a dataset configuration.

1stash datasets config upload --file dataset.yml --client-id $CS_CLIENT_ID --client-key $CS_CLIENT_KEY
Previous
Clients keys