CipherStashDocs

Troubleshooting

Diagnose and resolve issues with CipherStash Proxy, including ZeroKMS connection problems and slow statement performance.

Troubleshooting ZeroKMS connections

For a quick check on ZeroKMS latency and connection issues, set:

CS_LOG__ZEROKMS_LEVEL=debug
CS_LOG__ENCRYPT_LEVEL=debug

For a deeper investigation, also enable slow statement detection:

CS_LOG__ZEROKMS_LEVEL=trace
CS_LOG__SLOW_STATEMENTS=true
CS_LOG__SLOW_STATEMENT_MIN_DURATION_MS=500
CS_LOG__SLOW_DB_RESPONSE_MIN_DURATION_MS=50

What to look for in logs

SignalMeaning
Initializing ZeroKMS ScopedCipher (cache miss)A network call to ZeroKMS is about to happen. Frequent occurrences indicate cache churn.
Connected to ZeroKMS with high init_duration_msSlow cipher init. Healthy values are under 200ms. Values over 1s trigger a warning.
Use cached ScopedCipherCache hit (fast path, no network call).
ScopedCipher evicted from cache with cause: SizeCache too small for workload. Increase cipher_cache_size.
Error initializing ZeroKMS with high init_duration_msNetwork timeout to ZeroKMS.
Error initializing ZeroKMS with low init_duration_msCredential or configuration error.

Key metrics

Enable Prometheus with CS_PROMETHEUS__ENABLED=true and watch these metrics:

MetricWhy
cipherstash_proxy_keyset_cipher_init_duration_secondsDistribution of ZeroKMS init times including network latency.
cipherstash_proxy_keyset_cipher_cache_hits_total / cache_miss_totalCache hit ratio. Should be above 95% in steady state.
cipherstash_proxy_statements_session_duration_seconds minus execution_duration_secondsEncryption overhead per statement.
cipherstash_proxy_encryption_error_total / decryption_error_totalSpikes indicate ZeroKMS connectivity issues.

Useful PromQL queries

# P99 cipher init latency
histogram_quantile(0.99, rate(cipherstash_proxy_keyset_cipher_init_duration_seconds_bucket[5m]))

# Cache hit ratio
rate(cipherstash_proxy_keyset_cipher_cache_hits_total[5m])
/ (rate(cipherstash_proxy_keyset_cipher_cache_hits_total[5m])
   + rate(cipherstash_proxy_keyset_cipher_cache_miss_total[5m]))

Quick checklist

  1. Cache hit ratio low? Tune CS_SERVER__CIPHER_CACHE_SIZE (default: 64) and CS_SERVER__CIPHER_CACHE_TTL_SECONDS (default: 3600).
  2. init_duration_ms above 1s? Network latency to ZeroKMS. Check DNS, firewall rules, and regional proximity.
  3. Large gap between session and execution duration? Overhead is in encrypt/decrypt, not the database.
  4. Frequent evictions? Increase cipher_cache_size to match your workload's keyset count.

Slow statement logging

CipherStash Proxy includes built-in slow statement logging for diagnosing performance issues.

Configuration

Enable slow statement logging via environment variables:

# Enable slow statement logging (required)
CS_LOG__SLOW_STATEMENTS=true

# Optional: Set minimum duration threshold (default: 2000ms)
CS_LOG__SLOW_STATEMENT_MIN_DURATION_MS=500

# Optional: Set log level (default: warn when enabled)
CS_LOG__SLOW_STATEMENTS_LEVEL=warn

# Recommended: Use structured logging for parsing
CS_LOG__FORMAT=structured

Slow statement log format

When a statement exceeds the threshold, the proxy logs a detailed breakdown:

{
  "client_id": 1,
  "duration_ms": 10500,
  "statement_type": "INSERT",
  "protocol": "extended",
  "encrypted": true,
  "encrypted_values_count": 3,
  "param_bytes": 1024,
  "query_fingerprint": "a1b2c3d4",
  "keyset_id": "uuid",
  "mapping_disabled": false,
  "breakdown": {
    "parse_ms": 5,
    "encrypt_ms": 450,
    "server_write_ms": 12,
    "server_wait_ms": 9800,
    "server_response_ms": 233
  }
}

Query fingerprints

Query fingerprints are ephemeral and instance-local. Each proxy instance generates a unique random key at startup to compute query_fingerprint values. Fingerprints change when the proxy restarts and cannot be correlated across different proxy instances. This is intentional for security. Use fingerprints for correlation within a single proxy instance's runtime only.

Prometheus histogram labels

Duration histograms include labels for filtering:

  • statement_type: insert, update, delete, select, other
  • protocol: simple, extended
  • mapped: true, false
  • multi_statement: true, false

Example PromQL queries:

# Average INSERT duration
histogram_quantile(0.5, rate(cipherstash_proxy_statements_session_duration_seconds_bucket{statement_type="insert"}[5m]))

# Compare encrypted vs passthrough
histogram_quantile(0.99, rate(cipherstash_proxy_statements_session_duration_seconds_bucket{mapped="true"}[5m]))

ZeroKMS cipher init metric

cipherstash_proxy_keyset_cipher_init_duration_seconds measures time for cipher initialization including the ZeroKMS network call. High values indicate ZeroKMS connectivity issues.

Interpreting results

SymptomLikely Cause
High encrypt_msZeroKMS latency or large payload.
High server_wait_msDatabase latency.
High cipher_init_durationZeroKMS cold start or network issues.
High parse_msComplex SQL or schema lookup.

On this page