Encrypting Personally Identifiable Information at rest in Marten documents

Encrypting Personally Identifiable Information at rest in Marten documents

Protecting Personally Identifiable Information (PII) has become essential as organizations handle increasing volumes of sensitive data. Regulations like the GDPR (General Data Protection Regulation) mandate strict controls over PII to safeguard individuals' privacy and prevent data breaches. One effective approach to securing PII is data masking, a method that conceals sensitive information by replacing it with altered values, rendering it useless if accessed by unauthorized parties. Masking techniques such as character substitution, encryption, and tokenization are also used to protect data.

I have written about how to do data masking with character substitution to secure data in Marten documents. With a series of blog posts we are going to look at various mechanisms to do encryption of data in Marten documents at rest (as stored in db) as a way to protect data. We will be covering two methods of encryption a) AES encryption using a preset standard encryption key for all documents b)A method to use Hashicorp Vault backends, with granular per-document encryption key support. And also a mechanism to do crypto-shredding i.e. deliberately deleting or overwriting the encryption key which renders the encrypted data irrecoverable.

This blog post will focus on the AES encryption using a standard key across all documents for encrypting data.

We are defining an interface to make the encryption functionality pluggable. This has methods to encrypt, decrypt and drop the encryption key as below:

namespace marten_docs_pii;

public interface IEncryptionService  
{  
    Task<string> EncryptAsync(string plainText, string? key = null);
    Task<(bool success, string plainText)> TryDecryptAsync(string cipherText, string? key=null);  
    Task DropEncryptionKeyAsync(string key) => new(() => { });  
}

As a next step, let us implement AES encryption as below using System.Security.Cryptography:

using System.Security.Cryptography;  
using System.Text;  

namespace marten_docs_pii;  

public class AesEncryptionService : IEncryptionService  
{  
    private readonly string _key;  
    private readonly string _iv;  

    public AesEncryptionService(string key, string iv)  
    {        _key = key;  
        _iv = iv;  
    }  
    public static (string key, string iv) GenerateKeyAndIv()  
    {        using var aes = Aes.Create();  
        var key = Convert.ToBase64String(aes.Key);  
        var iv = Convert.ToBase64String(aes.IV);  
        return (key, iv);  
    }  
    public Task<string> EncryptAsync(string plainText, string? key = null)  
    {        using var aes = Aes.Create();  
        aes.Key = Convert.FromBase64String(_key);  
        aes.IV = Convert.FromBase64String(_iv);  

        var encryptor = aes.CreateEncryptor();  
        var plainBytes = Encoding.UTF8.GetBytes(plainText);  
        var cipherBytes = encryptor.TransformFinalBlock(plainBytes, 0, plainBytes.Length);  
        return Task.FromResult(Convert.ToBase64String(cipherBytes));  
    }  
    public Task<(bool success, string plainText)> TryDecryptAsync(string cipherText, string? key=null)  
    {        try  
        {  
            using var aes = Aes.Create();  
            aes.Key = Convert.FromBase64String(_key);  
            aes.IV = Convert.FromBase64String(_iv);  

            var decryptor = aes.CreateDecryptor();  
            var cipherBytes = Convert.FromBase64String(cipherText);  
            var plainBytes = decryptor.TransformFinalBlock(cipherBytes, 0, cipherBytes.Length);  
            var plainText = Encoding.UTF8.GetString(plainBytes);  
            return Task.FromResult((true, plainText));  
        }        catch  
        {  
            return Task.FromResult((false, string.Empty));  
        }    }}

The service requires both a key and initialization vector (IV) for operation, which can be either provided manually or generated using the static method GenerateKeyAndIv(). The encryption process converts plaintext into a Base64-encoded string using UTF8 encoding, while the decryption process reverses this operation with proper error handling.

As a next step, let us look at how to define the properties of a document type to be encrypted using a fluent interface to define the encryption rules using a combination of EncryptionRules and EncryptionExtensions and also see how we wire the encryption and decryption into the serializer pipeline using EncryptionSerializer.

using System.Linq.Expressions;
using System.Reflection;

namespace marten_docs_pii;

public class EncryptionRules
{
    private readonly Dictionary<Type, List<LambdaExpression>> _encryptionRules = new();
    private static volatile EncryptionRules? _instance;
    private static readonly object Lock = new();
    private readonly IEncryptionService _encryptionService;

    private EncryptionRules(IEncryptionService encryptionService)
    {
        _encryptionService = encryptionService;
    }

    public static void Initialize(IEncryptionService encryptionService)
    {
        if (_instance != null)
        {
            throw new InvalidOperationException("EncryptionRules has already been initialized");
        }

        lock (Lock)
        {
            if (_instance == null)
            {
                _instance = new EncryptionRules(encryptionService);
            }
            else
            {
                throw new InvalidOperationException("EncryptionRules has already been initialized");
            }
        }
    }

    public static EncryptionRules Instance 
    {
        get
        {
            if (_instance == null)
            {
                throw new InvalidOperationException(
                    "Call UseEncryptionRulesForProtectedInformation on StoreOptions before using it");
            }
            return _instance;
        }
    }

    public void AddEncryptionRule<T>(Expression<Func<T, object>> propertySelector) where T : class
    {
        if (!_encryptionRules.ContainsKey(typeof(T)))
        {
            _encryptionRules[typeof(T)] = [];
        }

        _encryptionRules[typeof(T)].Add(propertySelector);
    }

    public async Task<object> EncryptDocumentAsync(object document)
    {
        return await TransformDocumentAsync(document, true);
    }

    public async Task<object> DecryptDocumentAsync(object document)
    {
        document = await TransformDocumentAsync(document, false);
        return document;
    }

    private async Task<object> TransformDocumentAsync(object document, bool encrypt)
    {
        var documentType = document.GetType();
        if (!_encryptionRules.TryGetValue(documentType, out var expressions))
        {
            return document;
        }

        string? key = null;

        // check if document implement IHasEncryptionKey
        if (documentType.GetInterfaces().Any(x => x == typeof(IHasEncryptionKey)))
        {
            key = ((IHasEncryptionKey)document).EncryptionKey;
        }


        // For records, we need to create a new instance
        var currentObj = document;
        var anyChanges = false;

        foreach (var expression in expressions)
        {
            var memberExp = GetMemberExpression(expression.Body);
            if (memberExp == null) continue;

            var value = GetPropertyValue(currentObj, memberExp);
            if (value == null) continue;

            var currentValue = value.ToString()!;
            string transformedValue;

            if (encrypt)
            {
                transformedValue = await _encryptionService.EncryptAsync(currentValue, key);
                anyChanges = true;
            }
            else
            {
                var decryptResult = await _encryptionService.TryDecryptAsync(currentValue, key);
                if (decryptResult.success)
                {
                    transformedValue = decryptResult.plainText;
                    anyChanges = true;
                }
                else
                {
                    continue;
                }
            }

            if (memberExp.Expression is MemberExpression parentMemberExp)
            {
                // Handle nested property (like Prop1.childProp)
                var parentValue = GetPropertyValue(currentObj, parentMemberExp);
                if (parentValue == null) continue;

                var propertyName = memberExp.Member.Name;
                var newParentObj = CreateNewWithProperty(parentValue, propertyName, transformedValue);

                // Update the parent property on the main document
                var parentPropName = parentMemberExp.Member.Name;
                currentObj = CreateNewWithProperty(currentObj, parentPropName, newParentObj);
            }
            else
            {
                // Handle top-level property
                var propertyName = memberExp.Member.Name;
                currentObj = CreateNewWithProperty(currentObj, propertyName, transformedValue);
            }
        }

        return anyChanges ? currentObj : document;
    }

    private static object CreateNewWithProperty(object obj, string propertyName, object newValue)
    {
        var type = obj.GetType();
        var constructor = type.GetConstructors().First();
        var parameters = constructor.GetParameters();
        var args = new object[parameters.Length];

        for (var i = 0; i < parameters.Length; i++)
        {
            var param = parameters[i];
            if (string.Equals(param.Name, propertyName, StringComparison.OrdinalIgnoreCase))
            {
                args[i] = newValue;
            }
            else
            {
                var prop = type.GetProperty(param.Name!, BindingFlags.Public | BindingFlags.Instance);
                args[i] = prop!.GetValue(obj)!;
            }
        }

        return constructor.Invoke(args);
    }

    private static object? GetPropertyValue(object obj, MemberExpression memberExp)
    {
        var members = new List<MemberExpression>();
        var current = memberExp;

        while (current != null)
        {
            members.Add(current);
            current = current.Expression as MemberExpression;
        }

        members.Reverse();

        var value = obj;
        foreach (var member in members)
        {
            if (value == null) return null;
            if (member.Member is PropertyInfo prop)
            {
                value = prop.GetValue(value);
            }
        }

        return value;
    }

    private static MemberExpression? GetMemberExpression(Expression expression)
    {
        return expression switch
        {
            MemberExpression memberExp => memberExp,
            UnaryExpression unaryExp => unaryExp.Operand as MemberExpression,
            _ => null
        };
    }

    public bool HasEncryptionRules(object document)
    {
        return _encryptionRules.ContainsKey(document.GetType());
    }
}

The EncryptionRules class serves as the core orchestrator for managing document encryption in the Marten document database system. Implemented as a singleton using a thread-safe double-check locking pattern, it maintains a dictionary of encryption rules mapped to specific document types.

The class allows developers to declaratively specify which properties of their document models should be encrypted using Lambda expressions through the AddEncryptionRule() method.

The TransformDocumentAsync method handles the heavy lifting of both encryption and decryption operations, working recursively through nested properties while maintaining immutability of the original documents. This design ensures that sensitive data is automatically encrypted before storage and decrypted upon retrieval, all while maintaining the flexibility to handle complex document structures and different encryption strategies.

What makes this implementation particularly powerful is its support for per-document encryption keys through the IHasEncryptionKey interface, allowing different documents to use different encryption keys. In the case of AES encryption, we are only using a standard key defined at the service level so all documents use the same key for encryption. So any document level encryption key are completely ignored. When we look at HashiCorp Vault based encryption as a follow on blog post, we will dive deeper into document level encryption keys and the role of IHasEncryptionKey interface.

I am just sharing the definition of IHasEncrpytionKey for completeness here:

public interface IHasEncryptionKey
{ 
    string EncryptionKey { get; } 
}

Let us look at the wrapper encryption serializer EncryptionSerializer :

using System.Data.Common;
using Marten;
using Weasel.Core;

namespace marten_docs_pii;

public class EncryptionSerializer(
    ISerializer innerSerializer,
    EncryptionRules encryptionRules)
    : ISerializer
{
    public async Task<string> ToJsonAsync(object? document)
    {
        var doc = document != null && encryptionRules.HasEncryptionRules(document) 
            ? await encryptionRules.EncryptDocumentAsync(document) 
            : document;
        return innerSerializer.ToJson(doc);
    }

    public string ToJson(object? document)
    {
        // For backward compatibility, we'll run the async method synchronously
        return ToJsonAsync(document).GetAwaiter().GetResult();
    }

    public async Task<T> FromJsonAsync<T>(Stream stream)
    {
        var obj = innerSerializer.FromJson<T>(stream);
        if (obj != null && encryptionRules.HasEncryptionRules(obj))
        {
            return (T)await encryptionRules.DecryptDocumentAsync(obj);
        }
        return obj;
    }

    public T FromJson<T>(Stream stream)
    {
        // For backward compatibility, we'll run the async method synchronously
        return FromJsonAsync<T>(stream).GetAwaiter().GetResult();
    }

    public async Task<T> FromJsonAsync<T>(DbDataReader reader, int index)
    {
        var obj = innerSerializer.FromJson<T>(reader, index);
        if (obj != null && encryptionRules.HasEncryptionRules(obj))
        {
            return (T)await encryptionRules.DecryptDocumentAsync(obj);
        }
        return obj;
    }

    public T FromJson<T>(DbDataReader reader, int index)
    {
        // For backward compatibility, we'll run the async method synchronously
        return FromJsonAsync<T>(reader, index).GetAwaiter().GetResult();
    }

    public async ValueTask<T> FromJsonAsync<T>(Stream stream, CancellationToken cancellationToken)
    {
        var obj = await innerSerializer.FromJsonAsync<T>(stream, cancellationToken);
        if (obj != null && encryptionRules.HasEncryptionRules(obj))
        {
            return (T)await encryptionRules.DecryptDocumentAsync(obj);
        }
        return obj;
    }

    public async ValueTask<T> FromJsonAsync<T>(DbDataReader reader, int index, CancellationToken cancellationToken)
    {
        var obj = await innerSerializer.FromJsonAsync<T>(reader, index, cancellationToken);
        if (obj != null && encryptionRules.HasEncryptionRules(obj))
        {
            return (T)await encryptionRules.DecryptDocumentAsync(obj);
        }
        return obj;
    }

    public async Task<object> FromJsonAsync(Type type, Stream stream)
    {
        var obj = innerSerializer.FromJson(type, stream);
        return encryptionRules.HasEncryptionRules(obj) 
            ? await encryptionRules.DecryptDocumentAsync(obj) 
            : obj;
    }

    public object FromJson(Type type, Stream stream)
    {
        // For backward compatibility, we'll run the async method synchronously
        return FromJsonAsync(type, stream).GetAwaiter().GetResult();
    }

    public async Task<object> FromJsonAsync(Type type, DbDataReader reader, int index)
    {
        var obj = innerSerializer.FromJson(type, reader, index);
        return encryptionRules.HasEncryptionRules(obj) 
            ? await encryptionRules.DecryptDocumentAsync(obj) 
            : obj;
    }

    public object FromJson(Type type, DbDataReader reader, int index)
    {
        // For backward compatibility, we'll run the async method synchronously
        return FromJsonAsync(type, reader, index).GetAwaiter().GetResult();
    }

    public async ValueTask<object> FromJsonAsync(Type type, Stream stream, CancellationToken cancellationToken)
    {
        var obj = await innerSerializer.FromJsonAsync(type, stream, cancellationToken);
        return encryptionRules.HasEncryptionRules(obj) 
            ? await encryptionRules.DecryptDocumentAsync(obj) 
            : obj;
    }

    public async ValueTask<object> FromJsonAsync(Type type, DbDataReader reader, int index,
        CancellationToken cancellationToken)
    {
        var obj = await innerSerializer.FromJsonAsync(type, reader, index, cancellationToken);
        return encryptionRules.HasEncryptionRules(obj) 
            ? await encryptionRules.DecryptDocumentAsync(obj) 
            : obj;
    }

    public async Task<string> ToCleanJsonAsync(object? document)
    {
        var doc = document != null && encryptionRules.HasEncryptionRules(document) 
            ? await encryptionRules.DecryptDocumentAsync(document) 
            : document;
        return innerSerializer.ToJson(doc);
    }

    public string ToCleanJson(object? document)
    {
        // For backward compatibility, we'll run the async method synchronously
        return ToCleanJsonAsync(document).GetAwaiter().GetResult();
    }

    public string ToJsonWithTypes(object document)
    {
        var doc = encryptionRules.HasEncryptionRules(document) 
            ? encryptionRules.EncryptDocumentAsync(document).GetAwaiter().GetResult() 
            : document;
        return innerSerializer.ToJsonWithTypes(doc);
    }

    public EnumStorage EnumStorage { get; } = innerSerializer.EnumStorage;
    public Casing Casing { get; } = innerSerializer.Casing;
    public ValueCasting ValueCasting { get; } = innerSerializer.ValueCasting;
}

The EncryptionSerializer class acts as a specialized wrapper around Marten's default/configured serialization system, implementing the ISerializer interface to provide transparent encryption and decryption of document properties.

Using the decorator pattern, it intercepts the serialization pipeline by wrapping an inner serializer and applying encryption rules before serialization and after deserialization. The class provides both synchronous and asynchronous methods through ToJsonAsync and FromJsonAsync, with synchronous methods internally using async operations for consistency.

EncryptionExtensions are defined as below:

public static class EncryptionExtensions
{
    public static void UseEncryptionRulesForProtectedInformation(
        this StoreOptions options,
        IEncryptionService encryptionService)
    {
        EncryptionRules.Initialize(encryptionService);

        // Replace the default serializer with our decrypting serializer
        var innerSerializer = options.Serializer();
        options.Serializer(new EncryptionSerializer(
            innerSerializer,
            EncryptionRules.Instance));
    }

    public static MartenRegistry.DocumentMappingExpression<T> AddEncryptionRuleForProtectedInformation<T>(
        this MartenRegistry.DocumentMappingExpression<T> documentMappingExpression, 
        Expression<Func<T, object>> memberExpression) where T : class
    {
        EncryptionRules.Instance.AddEncryptionRule(memberExpression);
        return documentMappingExpression;
    }
}

The EncryptionExtensions class provides a fluent configuration interface for integrating encryption capabilities into Marten's document store. Through extension methods, it seamlessly connects the encryption framework with Marten's serialization pipeline.

The UseEncryptionRulesForProtectedInformation method initializes the encryption rules and replaces Marten's default serializer with the custom EncryptionSerializer wrapper, enabling transparent encryption/decryption of documents.

Complementing this, the AddEncryptionRuleForProtectedInformation method extends Marten's document schema mapping API, allowing developers to declaratively specify which properties should be encrypted using a fluent syntax. This design follows the builder pattern, making it intuitive to configure encryption rules during document store setup while maintaining chainable method calls.

Now let us put all these parts together and see a working example. Let us first define the document types as below:

public record Address(string Street, string City);
public record Person(Guid Id, string Name, string Phone, Address Address)

Create an instance of AesEncryptionService as below:

// created via var (key, iv) = AesEncryptionService.GenerateKeyAndIv();
const string key = "tKDM8/ZCTZkRtKi7ZKDALBTEE/+WmMA5SEpWp02Y0qs=";
const string iv = "L/G6cEvpCK/0XUS2kWsKoA==";
var encryptionService = new AesEncryptionService(key, iv);

As a next step, create the document store including setting up the encryption rules:

await using var store = DocumentStore.For(opts =>
{
    opts.Connection(
"Host=localhost;Database=marten_testing;Username=postgres;Password=postgres");
    opts.UseEncryptionRulesForProtectedInformation(encryptionService);
    opts.Schema.For<Person>()
        .AddEncryptionRuleForProtectedInformation(x => x.Name)
        .AddEncryptionRuleForProtectedInformation(x => x.Phone)
        .AddEncryptionRuleForProtectedInformation(x => x.Address.Street);
});

This code snippet demonstrates the configuration of Marten's DocumentStore with encryption capabilities for protecting sensitive data. The configuration uses a fluent API to set up both database connection and encryption rules. First, it establishes a connection to a PostgreSQL database using standard connection parameters. Then, through UseEncryptionRulesForProtectedInformation, it integrates the encryption service (AES in this case) into Marten's pipeline.

The most interesting part is the schema configuration for the Person class, where it explicitly defines which properties should be encrypted using AddEncryptionRuleForProtectedInformation. In this case, it marks Name, Phone, and the nested Address.Street properties for encryption, demonstrating the system's ability to handle both top-level and nested property encryption.

Let us add a document and inspect the data in database as below:

await using var session = store.LightweightSession();

// Create and store a person
var person1 = new Person(
    Guid.NewGuid(), 
    "John Doe", 
    "111-111", 
    new Address("123 Main St", "Anytown"));

session.Store(person1);
await session.SaveChangesAsync();

When you inspect the "data at rest" as stored in database, you will see the below with the right set of properties encrypted i.e Name, Phone and Address.City:

{
    "Id": "152d068f-7aa3-46e0-8126-28c4e20af462",
    "Name": "G/ntphq74+pJvxOIbRqhoA==",
    "Phone": "UwNTmU3IfRg1SBSJcsASRw==",
    "Address": {
        "City": "Anytown",
        "Street": "LxhcXsfU97SW7y18KaOEew=="
    }
}

If you retrieve the document using Marten session, you will see that the data is decrypted properly as below:

var person = await session.LoadAsync<Person>("152d068f-7aa3-46e0-8126-28c4e20af462");  
Console.WriteLine($"Name: {person?.Name}"); // Will show decrypted value  
Console.WriteLine($"Phone: {person?.Phone}"); // Will show decrypted value  
Console.WriteLine($"Street: {person?.Address.Street}, City: {person?.Address.City}");

The output is as below:

Name: John Doe
Phone: 111-111
Street: 123 Main St, City: Anytown
Name: John Doe, Phone: 111-111, Street: 123 Main St

In summary, This implementation demonstrates document encryption using Marten with AES encryption. The code showcases a complete workflow for storing and retrieving encrypted documents. It begins by using predefined AES encryption keys and initializes an AesEncryptionService. The Marten DocumentStore is configured with encryption rules that specifically target sensitive fields in the Person record: Name, Phone, and the nested Address.Street property. The program creates a single Person instance with sample data and demonstrates the transparent encryption/decryption process by storing and then retrieving the document. After saving, it verifies the encryption by loading the document back and displaying the automatically decrypted values. The Person class is implemented as a simple record without the IHasEncryptionKey interface, meaning it uses the default encryption key rather than per-document keys.

In the next blog post, we will look at Hashicorp Vault based encryption with granular per-document encryption key support including using IHasEncryptionKey. And also a mechanism to do crypto-shredding i.e. deliberately deleting or overwriting the encryption key which renders the encrypted data irrecoverable. Stay tuned!

The source is available here for your ready reference.