Privacy Purger
This section describes how to install the product Privacy Purger.
Introduction to Privacy Purger
This specialized software component identifies and redacts Personally Identifiable Information (PII) from unstructured input texts, rendering them fully depersonalized for secure transmission to external platforms such as Large Language Models (LLMs). While this service serves as a core pillar of the ConSol CM/AI Assist subscription to ensure GDPR compliance, it is also available as a standalone license for integration into independent enterprise infrastructures.
Installing Privacy Purger
-
Download the latest release from our FTP server at
fx.consol.de.wget -r --ask-password -nH --cut-dirs=3 ftp://cmPrivacyPurger@fx.consol.de/pub/privacy-purger-releases/1.2.1/*infoTo obtain the download password, please contact our ConSol CM support team.
-
Create the
docker-compose.ymlfile and paste the following template content.docker-compose.yml:
services:privacy-purger-app:image: consol.de/privacy-purger-app:1.2.1container_name: privacy-purger-appdepends_on:- privacy-purger-nerrestart: always# 2nd option how to set environment values# Uncomment#volumes:# - type: bind# source: ./application.yml# target: /home/runner/application.yml# end of commented linesenvironment:# 1st option how to set environment values- "config.ner.endpoint=http://privacy-purger-ner:5000/api/labeling"# to enable debug mode#- "logging.level.com.consol.=DEBUG"# 2nd option how to set environment values# Uncomment# - SPRING_CONFIG_LOCATION=file:/home/runner/# end of commented linesports:- 28080:28080#logs are stored in /var/lib/docker/containers/[container-id]/[container-id]-json.log#they will be removed automatically after execution of "docker compose down"logging:driver: "json-file"options:max-size: "50m"max-file: "10"compress: "true"healthcheck:test: [ "CMD-SHELL", "curl --fail http://localhost:28080 || exit 1" ]interval: 60sretries: 5start_period: 30stimeout: 10sprivacy-purger-ner:image: consol.de/privacy-purger-ner:1.2.1container_name: privacy-purger-nerrestart: alwaysports:- 5000:5000#logs are stored in /var/lib/docker/containers/[container-id]/[container-id]-json.log#they will be removed automatically after execution of "docker compose down"logging:driver: "json-file"options:max-size: "50m"max-file: "10"compress: "true"healthcheck:test: ["CMD-SHELL", "curl --fail -X POST http://localhost:5000/api/labeling -H 'Content-Type: application/json' -d '{\"text\":\"Docker Compose Healthcheck\"}' || exit 1"]interval: 60sretries: 5start_period: 30stimeout: 10sImage versionEnsure that the image tags in the
docker-compose.ymlfile correspond to the specific release version obtained from the FTP server. For example, replace1.2.1with your current version. -
Create the
application.ymlfile in the same directory as thedocker-compose.ymland paste the following template content.application.yml:
# Sample application file# only ner.endpoint has been changedserver:port: 28080config:api-key: #YourAPIKeyregexPurger:- id: emaildescription: "Replaces email addresses by using RegEx"replacement: "<<EMAIL>>"regexList:- '\b[\w-\.]+@([\w-]+\.)+[\w-]{2,4}\b'- id: ibandescription: "Replaces IBANs by using RegEx"replacement: "<<IBAN>>"regexList:- '\b[A-Z]{2}[0-9]{2}(?:[ ]?[0-9]{4}){4}(?!(?:[ ]?[0-9]){3})(?:[ ]?[0-9]{1,2})?\b'- '\b(?:(?:[A-Z]{2})(?:\d{2}))(?=\d{2}[A-Z0-9]{9,30}$)[A-Z0-9]{14,34}\b'- id: postfachdescription: "Replaces postfach by using RegEx"replacement: "<<POSTFACH>>"regexList:- '([Pp]ostfach)(([\d][\d ])?([\d]+)?[ .]?([\d ]+))([\d])+'- id: birthdatedescription: "Replaces Birthdate by using RegEx"replacement: "<<BIRTHDATE>>"regexList:- '\b(0?[1-9]|[12]\d|3[01])\.(0?[1-9]|1[0-2])\.\d{4}\b'- '\b\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])\b'- id: phonedescription: "Replaces phone numbers by using RegEx"replacement: "<<PHONE>>"regexList:- '(\(?([\d\-\)\–\+\/\(][\d \-\)\–\+\/\(]{6,})\)?([\d]+)[ .\-–\/]?([\d]+))'- id: zipdescription: "Replaces ZIPs by using RegEx"replacement: "<<ZIP>>"regexList:- '\b\d{5}\b'ner:endpoint: "http://privacy-purger-ner:5000/api/labeling"replace-loc: truereplace-per: truereplace-org: falsereplace-misc: falseper-replacement: '<<PER>>'org-replacement: '<<ORG>>'misc-replacement: '<<MISC>>'loc-replacement: '<<LOC>>'text-slice-threshold: 925infoEnter your preferred API key in the api-key field within the
configsection of yourapplication.yml. -
Linking the
application.ymlto Docker ComposeIn order for the
application.ymlto be loaded at startup, please uncomment and configure the following lines in thedocker-compose.ymlfile to match your setup.### Uncomment and configure volumesvolumes:- type: bindsource: ./application.ymltarget: /app/application.ymlenvironment:# ...# Uncomment and configure Spring-Config- SPRING_CONFIG_LOCATION=file:/app/application.yml -
Load the images using the following commands:
docker load --input privacy-purger-app:1.2.1.tardocker load --input privacy-purger-ner:1.2.1.tarnoteThe
nerimage is large in size, so the loading process may take a few moments. -
Verify that the images have been loaded successfully by running:
docker imagesYou should now see the
privacy-purger-appandprivacy-purger-nerimages in the list. -
Once the images are successfully loaded, start the Privacy Purger with the following command:
docker compose up -d -
Use the command
docker compose psto monitor the container status. Ensure that the containers reach a healthy state before proceeding.docker compose ps -
To test the installation, open your web browser and navigate to the following URL to access the Privacy Purge Playground:
http://<your-ip>:28080tipTo ensure that your
application.ymlis being loaded correctly, you can perform the following test:Modify the
application.yml: Locate the email entry in theregexPurgerlist and change the replacement value to<<EMAILTEST>>:regexPurger:- id: emaildescription: "Replaces email addresses by using RegEx"replacement: "<<EMAILTEST>>"Restart the containers: Apply the changes by restarting the Docker containers:
docker compose restartTest in the browser: Open the Privacy Purge Playground and process a text containing an email address.
Success: If the email address is replaced by
<<EMAILTEST>>, theapplication.ymlis loaded correctly.Final step: Revert the changes in your
application.ymland restart the containers once more to restore the original settings.
Offline mode
The Privacy Purger supports an offline mode, which is essential for environments with restricted internet access.
To enable offline mode for the Privacy Purger NER, you must set the following environment variables. These variables instruct the underlying libraries to skip any attempts to connect to external repositories:
HF_HUB_OFFLINE=1: Disables all communication with the Hugging Face Hub.TRANSFORMERS_OFFLINE=1: Forces the Transformers library to use only locally cached files.
Running with Docker
Since Docker containers do not automatically inherit environment variables from the host system, you must explicitly pass them using the -e flag during the container start.
Command example:
docker run -e HF_HUB_OFFLINE=1 -e TRANSFORMERS_OFFLINE=1 -p 5000:5000 consol.de/privacy-purger-ner:1.2.1
Alternatively, you can define the offline settings in a docker-compose.yml file.
services:
privacy-purger-ner:
image: consol.de/privacy-purger-ner:1.2.1
ports:
- "5000:5000"
environment:
- HF_HUB_OFFLINE=1
- TRANSFORMERS_OFFLINE=1
Running as a local Python script
If you are running the Privacy Purger directly as a Python script (e.g., python NerService.py), you need to follow a specific sequence:
The very first start of the script requires an active internet connection to download all necessary resources and models. During this initial run, do not set the offline variables, as they will block the required connection.
Recommended workflow:
- Run the script once with an internet connection to let it download everything.
- Once the first run is successful, stop the script.
- Set the environment variables
HF_HUB_OFFLINE=1andTRANSFORMERS_OFFLINE=1. - Restart the script; it will now work in full offline mode.