Dataset configuration

A dataset is a collection of tables and fields that you want to encrypt. The configuration includes:

  • The types of indexes set for each column in the table
  • The mode for each index
  • The data type
  • Settings for match indexes, eg tokenization settings.

We suggest creating a separate dataset for each environment you are handling sensitive data in. This allows the dataset configuration to be updated and tested without affecting another environment. When creating the dataset, make sure you specify a clear and unique description that identifies what the dataset is used for.

Dataset management

Creating a dataset

To create a dataset run the following command in the CipherStash CLI. If you don't have the CLI installed, please follow the getting started guide.

1stash datasets create my_dataset_name --description "Test application"

Uploading a dataset

Use the CipherStash CLI to upload a dataset configuration to your account. Note you will need to have created a client key before you can upload a dataset configuration.

1stash datasets config upload --file dataset.yml --client-id $CS_CLIENT_ID --client-key $CS_CLIENT_KEY

Configuration reference

OptionDescriptionExample Setting
tablesList of tables to encryptusers
tables.pathName of the table to encryptusers
tables.fieldsList of fields to encryptname, email
tables.fields.nameName of the field to encryptname, email
tables.fields.in_placeWhether encrypted data is stored in the same column as plaintextfalse
tables.fields.cast_typeType of data stored in the columnutf8-str
tables.fields.modeEncryption modeplaintext
tables.fields.indexesList of indexes to create for the field
tables.fields.indexes.versionVersion of the index
tables.fields.indexes.kindType of indexmatch
tables.fields.indexes.tokenizerTokenizer used to tokenize the datangram
tables.fields.indexes.tokenizer.kindType of tokenizerngram
tables.fields.indexes.tokenizer.token_lengthLength of the tokens generated by the tokenizer3
tables.fields.indexes.token_filtersList of filters applied to the tokensdowncase
tables.fields.indexes.token_filters.kindType of filter
tables.fields.indexes.kNumber of tokens generated for each value6
tables.fields.indexes.mNumber of buckets used to store tokens2048
tables.fields.indexes.include_originalWhether the original value is stored in the indextrue