Guides
Dataset guide
A dataset is a collection of data that you want to encrypt. It defines the structure of the data and how it should be encrypted. You can create as many datasets as you need to encrypt all of your sensitive data.
Important: If you are using EQL and CipherStash Proxy, the dataset configuration is stored in the database and is not stored in the CipherStash Cloud.
Creating a dataset
First you need to understand the structure of your data. For example, if you have a users
table in your database, you might have the following columns:
id
name
email
We suggest starting with a single table and encrypting a few columns to get a feel for how CipherStash Proxy works. Once you're comfortable with CipherStash Proxy, you can encrypt more columns and tables.
Initializing a dataset
To initialize a dataset, run the following command in the CipherStash CLI. If you don't have the CLI installed, follow the installation guide.
1stash datasets create users --description "UAT: users"
Defining the dataset
Next, decide which columns you want to encrypt. In this example, we'll encrypt the name
and email
columns. We'll leave the id
column unencrypted as it's not considered sensitive data. To understand what data you should choose to encrypt, read what data should I encrypt?.
Now that you know which columns to encrypt, define the dataset. Do this by creating a YAML file that describes the structure of the data. In this example, we'll create a file called dataset.yml
with the following contents:
1tables:
2 - path: users
3 fields:
4 - name: name
5 in_place: false
6 cast_type: utf8-str
7 mode: plaintext-duplicate
8 indexes:
9 - version: 1
10 kind: match
11 tokenizer:
12 kind: ngram
13 token_length: 3
14 token_filters:
15 - kind: downcase
16 k: 6
17 m: 2048
18 include_original: true
19 - version: 1
20 kind: ore
21 - version: 1
22 kind: unique
23 - name: email
24 in_place: false
25 cast_type: utf8-str
26 mode: plaintext-duplicate
27 indexes:
28 - version: 1
29 kind: match
30 tokenizer:
31 kind: ngram
32 token_length: 3
33 token_filters:
34 - kind: downcase
35 k: 6
36 m: 2048
37 include_original: true
38 - version: 1
39 kind: ore
40 - version: 1
41 kind: unique
Settings to use
If you're only getting started with CipherStash Proxy, only focus on the tables and columns you want to encrypt. You can ignore the other advanced settings until you are comfortable with CipherStash Proxy, as the defaults will work for all use cases. These are the settings to define:
Option | Description | Example |
---|---|---|
tables | List of tables to encrypt | |
tables.path | Name of the table to encrypt | users |
tables.fields | List of fields to encrypt | |
tables.fields.name | Name of the field to encrypt | name |
tables.fields.cast_type | Type of data stored in the column | utf8-str |
tables.fields.mode | Encryption mode | plaintext-duplicate |
The settings above for the other configurations can be used for all tables and fields, specifically the indexes
definition. To learn more about the other settings, refer to dataset configurations.
Uploading a dataset
Use the CipherStash CLI to upload a dataset configuration to your account. Note you will need to have created a client key before you can upload a dataset configuration.
1stash datasets config upload --file dataset.yml --client-id $CS_CLIENT_ID --client-key $CS_CLIENT_KEY