vitaliikovalenko.com/blog

nginx: IP anonymization for GDPR

IP addresses of the site visitors web servers commonly collect and store for a long time or even indefinitely by-default are considered to be personal data in terms of General Data Protection Regulation. It means that even-though we have a good reason to collect, store, and analyze them to prevent attacks and protect our website and all the users' data our users may create by using it, we need to notify the users that we do collect their IP addresses. But if the website does not collect any other personal data right away, to disrupt the first time visitor's user experience with a pop-up only to notify him that the web-server stores his IP will be a bad idea. And do we really need the exact user's IP address? In case of distributed attack, blocking every individual IP would be inefficient. Blocking entire class C subnets will make much more sense.

So, in the spirit of GDPR, we should not collect any personal data we do not need. In the case of the visitors' IP addresses, we only need their class C subnet. It means, we should anonymize the IP's we collect and store by dropping the last number of the IP, or simply replacing it with 0. This way the stored /24 class C subnet may represent one of the hundreds (or even many thousands if the user came from the network behind the NAT) computers — enough to make it practically impossible to identify the user, and still enough to mitigate any attack originated from the user's computer (and all others that share the same class C subnet).

Here how it could be achieved with nginx web-server:

map $remote_addr $ip_anonym1 {
    default 0.0.0;
    "~(?P<ip>(\d+)\.(\d+)\.(\d+))\.\d+" $ip;
    "~(?P<ip>[^:]+:[^:]+):" $ip;
}

map $remote_addr $ip_anonym2 {
    default .0;
    "~(?P<ip>(\d+)\.(\d+)\.(\d+))\.\d+" .0;
    "~(?P<ip>[^:]+:[^:]+):" ::;
}

map $ip_anonym1$ip_anonym2 $ip_anonymized {
    default 0.0.0.0;
    "~(?P<ip>.*)" $ip;
}

map $http_referer $external {
    default             1;
    "~domain-name" 0;
}

log_format anonref '$ip_anonymized;$time_iso8601;$request_uri;$http_referer;$http_user_agent';
access_log /var/log/nginx/access.log anonref if=$external;
error_log /var/log/nginx/error.log alert;
Works with nginx 1.14.0