Personally Identifiable Information masking in Marten documents using partial update/patching

ยท

6 min read

Protecting Personally Identifiable Information (PII) has become essential as organizations handle increasing volumes of sensitive data. Regulations like the GDPR (General Data Protection Regulation) mandate strict controls over PII to safeguard individuals' privacy and prevent data breaches. One effective approach to securing PII is data masking, a method that conceals sensitive information by replacing it with altered values, rendering it useless if accessed by unauthorized parties. Masking techniques such as character substitution, encryption, and tokenization are also used to protect data.

You can consider this is a logical continuation to Masking PII information in Marten Event Store blog post written by Jeremy.

You have several options to update properties/data in Marten document store as below:

  • Use Session.Update which allows you to update a document data after it is fetched.

  • Use Partial Update/Patching which will allow you to do updates on the db directly without fetching the data.

Even though the above mechanisms are available, currently there is no official Marten API which standardizes masking of data as implemented for Marten Event Store events. I was curious and intrigued to see how this feature can be done for documents using the partial update/patching API.

Let me outline some of the goals I had for a simple masking as a feature in Marten Document Store:

  • Ability to define masking rule for one or more properties of a document type.

  • A neat fluent interface for masking properties like all other operations which we register at part of document Schema.

  • All masking operations are done using Marten native partial updates/patching.

MaskingRules class

First let look at MaskingRules class which allows us to capture all the masking rules:

using System.Linq.Expressions;
using Marten;
using Marten.Linq.Parsing;

namespace marten_docs_pii;

public class MaskingRules
{
    private readonly Dictionary<Type, List<KeyValuePair<string, object>>> _rules = new();
    private static volatile MaskingRules? _instance;
    private static readonly object _lock = new();
    private readonly Casing _casing;

    private MaskingRules(Casing casing)
    {
        _casing = casing;
    }

    public static void Initialize(StoreOptions opts)
    {
        if (_instance != null)
        {
            throw new InvalidOperationException("MaskingRules has already been initialized");
        }

        lock (_lock)
        {
            if (_instance == null)
            {
                _instance = new MaskingRules(opts.Serializer().Casing);
            }
            else
            {
                throw new InvalidOperationException("MaskingRules has already been initialized");
            }
        }
    }

    public static MaskingRules Instance 
    {
        get
        {
            if (_instance == null)
            {
                throw new InvalidOperationException(
                    "Call UseMaskingRulesForProtectedInformation on StoreOptions before using it");
            }
            return _instance;
        }
    }

    public void AddMaskingRule<T>(
        Expression<Func<T, object>> propertySelector, 
        object maskValue) where T : class
    {
        var propertyPath = GetPropertyPath(propertySelector);

        if (!_rules.ContainsKey(typeof(T)))
        {
            _rules[typeof(T)] = [];
        }

        _rules[typeof(T)].Add(new KeyValuePair<string, object>(propertyPath, maskValue));
    }

    private string GetPropertyPath<T>(Expression<Func<T, object>> propertySelector)
    {
        var visitor = new MemberFinder();
        visitor.Visit(propertySelector);
        return string.Join(".", visitor.Members.Select(x => x.Name.FormatCase(_casing)));
    }

    public List<KeyValuePair<string, object>> GeneratePatches<T>()
    {
        return !_rules.TryGetValue(typeof(T), out var typeRules) 
            ? [] 
            : typeRules;
    }
}

AddMaskingRule method

  • AddMaskingRule<T>(Expression<Func<T, object>> propertySelector, object maskValue) is the key method which takes get property expression for which the masking need to be applied and the mask value.

  • This method translates the property expression to a dot separated property path using GetPropertyPath method to pass it later in the patch Set operation. Few example's of property paths are Name, Address.Street. Whenever you a . in the property path, it means that it is a nested property.

  • Note that the property can be camel case or snake case depending on what is set at the Marten serializer level. So The MaskingRules class get passed the Casing to determine camel or snake is applied for the property names. For property casing handling, we have borrowed some internals from the Marten. See Section Property Casing

  • For each property having a masking rule for a given document type, it stores the masking rules as as a list of key value pairs with key as property path and value as masked value in a dictionary.

GeneratePatches method

  • Provides a list of key values pairs for use in Patching operation.

Apart from all of these, we do have a Singleton pattern to instantiate MaskingRules.

Property Casing

using Newtonsoft.Json.Serialization;
using JasperFx.Core;
using Marten;

namespace marten_docs_pii;

internal static class CasingExtensionMethods
{
    private static readonly SnakeCaseNamingStrategy SnakeCaseNamingStrategy = new();

    private static string ToSnakeCase(this string s)
    {
        return SnakeCaseNamingStrategy.GetPropertyName(s, false);
    }

    public static string FormatCase(this string s, Casing casing) =>
        casing switch
        {
            Casing.CamelCase => s.ToCamelCase(),
            Casing.SnakeCase => s.ToSnakeCase(),
            _ => s
        };
}
```

MaskingExtensions

We have few extension methods to wire it all together to form a nice Fluent Interface to setup masking and apply masking.

```csharp
using System.Linq.Expressions;
using Marten;
using Marten.Patching;

namespace marten_docs_pii;

public static class MaskingExtensions
{
    public static void UseMaskingRulesForProtectedInformation(this StoreOptions opts)
    {
        MaskingRules.Initialize(opts);
    }

    public static MartenRegistry.DocumentMappingExpression<T> AddMaskingRuleForProtectedInformation<T>(
        this MartenRegistry.DocumentMappingExpression<T> documentMappingExpression, 
        Expression<Func<T, object>> memberExpression, object maskValue) where T : class
    {
        MaskingRules.Instance.AddMaskingRule(memberExpression, maskValue);
        return documentMappingExpression;
    }

    public static void ApplyMaskForProtectedInformation<T>(this IPatchExpression<T> patcher)
    {
        var patches = MaskingRules.Instance.GeneratePatches<Person>();

        foreach (var patch in patches)
        {
            patcher = patcher.Set(patch.Key, patch.Value);
        }
    }
}

UseMaskingRulesForProtectedInformation(...)

This extensions method create a singleton instance of MaskingRules by passing StoreOptions from which the property casing is determined. Without running this step, adding and applying of masking rules won't work.

AddMaskingRuleForProtectedInformation(...)

This extension method is a fluent interface on MartenRegistry.DocumentMappingExpression<T> to setup the masking rule for the required properties.

ApplyMaskForProtectedInformation<T>(this IPatchExpression<T> patcher)

This extension method allows you to add the patch set for a given IPatchExpression<T> so that you can run masking on one or more documents as per the Patch LINQ expression.

Usage

Now let us look at an example usage of how masking rules are defined and used.

using Marten;  
using Marten.Patching;  
using marten_docs_pii;  

await using var store = DocumentStore.For(opts =>  
{  
    opts.Connection("Host=localhost;Database=marten_testing;Username=postgres;Password=postgres");  
    opts.UseMaskingRulesForProtectedInformation();  
    opts.Schema.For<Person>()  
        .AddMaskingRuleForProtectedInformation(  
            x => x.Name, "***")  
        .AddMaskingRuleForProtectedInformation(  
            x => x.Phone, "###-###")  
        .AddMaskingRuleForProtectedInformation(  
            x => x.Address.Street, "***");  
});  

await using var session = store.LightweightSession();  

// Create and store a person  
var person1 = new Person(  
    Guid.NewGuid(),   
"John Doe",   
"111-111",   
new Address("123 Main St", "Anytown"));  

var person2 = new Person(  
    Guid.NewGuid(),   
"Some Name",   
"222-222",   
new Address("Some Street", "Some City"));  

session.Store(person1, person2);  
await session.SaveChangesAsync();  

session.Patch<Person>(x => true).ApplyMaskForProtectedInformation();  
await session.SaveChangesAsync();  

// Query back to verify  
var maskedPerson = await session.LoadAsync<Person>(person1.Id);  
Console.WriteLine($"Name: {maskedPerson?.Name}");  
Console.WriteLine($"Phone: {maskedPerson?.Phone}");  
Console.WriteLine($"Street: {maskedPerson?.Address.Street}, City: {maskedPerson?.Address.City}");  

public record Address(string Street, string City);  
public record Person(Guid Id, string Name, string Phone, Address Address);

Setup the masking rules as part of store options:

  • Calling opts.UseMaskingRulesForProtectedInformation(); while setting up store to setup things to use and apply masking rules

  • This is an example of setting masking rule for Name in Person document type

opts.Schema
    .For<Person>()
    .AddMaskingRuleForProtectedInformation(x => x.Name,"***")

Applying Mask

  • This is an example of applying mask to all document of type Person
session.Patch<Person>(x => true).ApplyMaskForProtectedInformation();`  
await session.SaveChangesAsync();

When you fetch a specific document using:

var maskedPerson = await session.LoadAsync<Person>(person1.Id);  
Console.WriteLine($"Name: {maskedPerson?.Name}");  
Console.WriteLine($"Phone: {maskedPerson?.Phone}");  
Console.WriteLine($"Street: {maskedPerson?.Address.Street}, City: {maskedPerson?.Address.City}");

Masked output is as below:

Name: ***
Phone: ###-###
Street: ***, City: Anytown

In summary, you can see the power of partial updates/patching in action. This is just a glimpse of what you could possibly do to handle PII data. The full source code is available here.

Happy coding with Critter stack!

ย