Dataset configurations

A dataset is a collection of tables and fields that you want to encrypt. The configuration includes:

  • The types of indexes set for each column in the table.
  • The mode for each index.
  • The data type.
  • Settings for match indexes, eg tokenization settings.

We suggest creating a separate dataset for each environment you are handling sensitive data in. This allows the dataset configuration to be updated and tested without affecting another environment. Make sure you specify a clear and unique description when creating the dataset, that identifies what the dataset is used for.

Dataset management

Creating a dataset

To create a dataset run the following command in the CipherStash CLI. If you don't have the CLI installed, please follow the getting started guide.

1stash datasets create my_dataset_name --description "Test application"

Uploading a dataset

Use the CipherStash CLI to upload a dataset configuration to your account. Note you will need to have created a client key before you can upload a dataset configuration.

1stash datasets config upload --file dataset.yml --client-id $CS_CLIENT_ID --client-key $CS_CLIENT_KEY

Configuration reference

OptionDescriptionExample Setting
tablesList of tables to encryptusers
tables.pathName of the table to encryptusers
tables.fieldsList of fields to encryptname, email
tables.fields.nameName of the field to encryptname, email
tables.fields.in_placeWhether encrypted data is stored in the same column as plaintextfalse
tables.fields.cast_typeType of data stored in the columnutf8-str
tables.fields.modeEncryption modeplaintext
tables.fields.indexesList of indexes to create for the field
tables.fields.indexes.versionVersion of the index
tables.fields.indexes.kindType of indexmatch
tables.fields.indexes.tokenizerTokenizer used to tokenize the datangram
tables.fields.indexes.tokenizer.kindType of tokenizerngram
tables.fields.indexes.tokenizer.token_lengthLength of the tokens generated by the tokenizer3
tables.fields.indexes.token_filtersList of filters applied to the tokensdowncase
tables.fields.indexes.token_filters.kindType of filter
tables.fields.indexes.kNumber of tokens generated for each value6
tables.fields.indexes.mNumber of buckets used to store tokens2048
tables.fields.indexes.include_originalWhether the original value is stored in the indextrue