How-to guides

How do I create a dataset for my data?

A dataset is a collection of data that you want to encrypt. It defines the structure of the data and how it should be encrypted. You can create as many datasets as you need to encrypt all of your sensitive data.

Creating a dataset

First you need to understand the structure of your data. For example, if you have a users table in your database, you might have the following columns:

  • id
  • name
  • email

We suggest starting with a single table and encrypting a few columns to get a feel for how Tandem works and ensure it does not break your application. Once you are comfortable with Tandem, you can encrypt more columns and tables.

Initializing a dataset

To initialize a dataset, run the following command in the CipherStash CLI. If you don't have the CLI installed, please follow the installation guide.

1stash datasets create users --description "UAT: users"
2

Defining the dataset

Next, you need to decide which columns you want to encrypt. In this example, we will encrypt the name and email columns. We will leave the id column unencrypted as it's not considered sensitive data. To understand what data you should choose to encrypt please read the concept on what data should I encrypt?.

Now that we know which columns we want to encrypt, we need to define the dataset. We do this by creating a YAML file that describes the structure of the data. In this example, we will create a file called dataset.yml with the following contents:

1tables:
2  - path: users
3    fields:
4      - name: name
5        in_place: false
6        cast_type: utf8-str
7        mode: plaintext-duplicate
8        indexes:
9          - version: 1
10            kind: match
11            tokenizer:
12              kind: ngram
13              token_length: 3
14            token_filters:
15              - kind: downcase
16            k: 6
17            m: 2048
18            include_original: true
19          - version: 1
20            kind: ore
21          - version: 1
22            kind: unique
23      - name: email
24        in_place: false
25        cast_type: utf8-str
26        mode: plaintext-duplicate
27        indexes:
28          - version: 1
29            kind: match
30            tokenizer:
31              kind: ngram
32              token_length: 3
33            token_filters:
34              - kind: downcase
35            k: 6
36            m: 2048
37            include_original: true
38          - version: 1
39            kind: ore
40          - version: 1
41            kind: unique
42

Most pertinent settings

It's worth noting that beginners should only focus on what tables and columns you want to encrypt. The other settings are advanced and can be ignored until you are comfortable with Tandem as the defaults will work for all use cases. These are the settings that you will need to define:

OptionDescriptionExample
tablesThe list of tables to encrypt
tables.pathThe name of the table to encryptusers
tables.fieldsThe list of fields to encrypt
tables.fields.nameThe name of the field to encryptname
tables.fields.cast_typeThe type of data stored in the columnutf8-str
tables.fields.modeThe encryption modeplaintext-duplicate

The settings above for the other configurations can be used for all tables and fields, specifically the indexes definition. If you want to learn more about the other settings, please read our reference on dataset configurations.

Uploading a dataset

Use the CipherStash CLI to upload a dataset configuration to your account. Note you will need to have created a client key before you can upload a dataset configuration.

1stash datasets config upload --file dataset.yml --client-id $CS_CLIENT_ID --client-key $CS_CLIENT_KEY
2

Asking for help

We highly suggest setting up a call with one of our Solutions Engineers to help you get started. We're happy to help!

Previous
Next Steps