Zum Hauptinhalt springen
Version: 6.18

Privacy Purger

This section describes how to install the product Privacy Purger.

Introduction to Privacy Purger

This specialized software component identifies and redacts Personally Identifiable Information (PII) from unstructured input texts, rendering them fully depersonalized for secure transmission to external platforms such as Large Language Models (LLMs). While this service serves as a core pillar of the ConSol CM/AI Assist subscription to ensure GDPR compliance, it is also available as a standalone license for integration into independent enterprise infrastructures.

Installing Privacy Purger

  1. Download the latest release from our FTP server at fx.consol.de.

    wget  -r --ask-password -nH --cut-dirs=3 ftp://cmPrivacyPurger@fx.consol.de/pub/privacy-purger-releases/1.2.1/*
    info

    To obtain the download password, please contact our ConSol CM support team.

  2. Create the docker-compose.yml file and paste the following template content.

    docker-compose.yml:

    services:
    privacy-purger-app:
    image: consol.de/privacy-purger-app:1.2.1
    container_name: privacy-purger-app
    depends_on:
    - privacy-purger-ner
    restart: always
    # 2nd option how to set environment values
    # Uncomment
    #volumes:
    # - type: bind
    # source: ./application.yml
    # target: /home/runner/application.yml
    # end of commented lines
    environment:
    # 1st option how to set environment values
    - "config.ner.endpoint=http://privacy-purger-ner:5000/api/labeling"
    # to enable debug mode
    #- "logging.level.com.consol.=DEBUG"
    # 2nd option how to set environment values
    # Uncomment
    # - SPRING_CONFIG_LOCATION=file:/home/runner/
    # end of commented lines
    ports:
    - 28080:28080
    #logs are stored in /var/lib/docker/containers/[container-id]/[container-id]-json.log
    #they will be removed automatically after execution of "docker compose down"
    logging:
    driver: "json-file"
    options:
    max-size: "50m"
    max-file: "10"
    compress: "true"
    healthcheck:
    test: [ "CMD-SHELL", "curl --fail http://localhost:28080 || exit 1" ]
    interval: 60s
    retries: 5
    start_period: 30s
    timeout: 10s
    privacy-purger-ner:
    image: consol.de/privacy-purger-ner:1.2.1
    container_name: privacy-purger-ner
    restart: always
    ports:
    - 5000:5000
    #logs are stored in /var/lib/docker/containers/[container-id]/[container-id]-json.log
    #they will be removed automatically after execution of "docker compose down"
    logging:
    driver: "json-file"
    options:
    max-size: "50m"
    max-file: "10"
    compress: "true"
    healthcheck:
    test: ["CMD-SHELL", "curl --fail -X POST http://localhost:5000/api/labeling -H 'Content-Type: application/json' -d '{\"text\":\"Docker Compose Healthcheck\"}' || exit 1"]
    interval: 60s
    retries: 5
    start_period: 30s
    timeout: 10s
    Image version

    Ensure that the image tags in the docker-compose.yml file correspond to the specific release version obtained from the FTP server. For example, replace 1.2.1 with your current version.

  3. Create the application.yml file in the same directory as the docker-compose.yml and paste the following template content.

    application.yml:

    # Sample application file
    # only ner.endpoint has been changed
    server:
    port: 28080

    config:
    api-key: #YourAPIKey
    regexPurger:
    - id: email
    description: "Replaces email addresses by using RegEx"
    replacement: "<<EMAIL>>"
    regexList:
    - '\b[\w-\.]+@([\w-]+\.)+[\w-]{2,4}\b'
    - id: iban
    description: "Replaces IBANs by using RegEx"
    replacement: "<<IBAN>>"
    regexList:
    - '\b[A-Z]{2}[0-9]{2}(?:[ ]?[0-9]{4}){4}(?!(?:[ ]?[0-9]){3})(?:[ ]?[0-9]{1,2})?\b'
    - '\b(?:(?:[A-Z]{2})(?:\d{2}))(?=\d{2}[A-Z0-9]{9,30}$)[A-Z0-9]{14,34}\b'
    - id: postfach
    description: "Replaces postfach by using RegEx"
    replacement: "<<POSTFACH>>"
    regexList:
    - '([Pp]ostfach)(([\d][\d ])?([\d]+)?[ .]?([\d ]+))([\d])+'
    - id: birthdate
    description: "Replaces Birthdate by using RegEx"
    replacement: "<<BIRTHDATE>>"
    regexList:
    - '\b(0?[1-9]|[12]\d|3[01])\.(0?[1-9]|1[0-2])\.\d{4}\b'
    - '\b\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])\b'
    - id: phone
    description: "Replaces phone numbers by using RegEx"
    replacement: "<<PHONE>>"
    regexList:
    - '(\(?([\d\-\)\–\+\/\(][\d \-\)\–\+\/\(]{6,})\)?([\d]+)[ .\-–\/]?([\d]+))'
    - id: zip
    description: "Replaces ZIPs by using RegEx"
    replacement: "<<ZIP>>"
    regexList:
    - '\b\d{5}\b'
    ner:
    endpoint: "http://privacy-purger-ner:5000/api/labeling"
    replace-loc: true
    replace-per: true
    replace-org: false
    replace-misc: false
    per-replacement: '<<PER>>'
    org-replacement: '<<ORG>>'
    misc-replacement: '<<MISC>>'
    loc-replacement: '<<LOC>>'
    text-slice-threshold: 925
    info

    Enter your preferred API key in the api-key field within the config section of your application.yml.

  4. Linking the application.yml to Docker Compose

    In order for the application.yml to be loaded at startup, please uncomment and configure the following lines in the docker-compose.yml file to match your setup.

    ### Uncomment and configure volumes
    volumes:
    - type: bind
    source: ./application.yml
    target: /app/application.yml

    environment:
    # ...
    # Uncomment and configure Spring-Config
    - SPRING_CONFIG_LOCATION=file:/app/application.yml
  5. Load the images using the following commands:

    docker load --input privacy-purger-app:1.2.1.tar
    docker load --input privacy-purger-ner:1.2.1.tar
    hinweis

    The ner image is large in size, so the loading process may take a few moments.

  6. Verify that the images have been loaded successfully by running:

    docker images

    You should now see the privacy-purger-app and privacy-purger-ner images in the list.

  7. Once the images are successfully loaded, start the Privacy Purger with the following command:

    docker compose up -d
  8. Use the command docker compose ps to monitor the container status. Ensure that the containers reach a healthy state before proceeding.

    docker compose ps
  9. To test the installation, open your web browser and navigate to the following URL to access the Privacy Purge Playground: http://<your-ip>:28080

    tipp

    To ensure that your application.yml is being loaded correctly, you can perform the following test:

    Modify the application.yml: Locate the email entry in the regexPurger list and change the replacement value to <<EMAILTEST>>:

    regexPurger:
    - id: email
    description: "Replaces email addresses by using RegEx"
    replacement: "<<EMAILTEST>>"

    Restart the containers: Apply the changes by restarting the Docker containers:

    docker compose restart

    Test in the browser: Open the Privacy Purge Playground and process a text containing an email address.

    Success: If the email address is replaced by <<EMAILTEST>>, the application.yml is loaded correctly.

    Final step: Revert the changes in your application.yml and restart the containers once more to restore the original settings.

Offline mode

The Privacy Purger supports an offline mode, which is essential for environments with restricted internet access.

To enable offline mode for the Privacy Purger NER, you must set the following environment variables. These variables instruct the underlying libraries to skip any attempts to connect to external repositories:

  • HF_HUB_OFFLINE=1: Disables all communication with the Hugging Face Hub.
  • TRANSFORMERS_OFFLINE=1: Forces the Transformers library to use only locally cached files.

Running with Docker

Since Docker containers do not automatically inherit environment variables from the host system, you must explicitly pass them using the -e flag during the container start.

Command example:

docker run -e HF_HUB_OFFLINE=1 -e TRANSFORMERS_OFFLINE=1 -p 5000:5000 consol.de/privacy-purger-ner:1.2.1

Alternatively, you can define the offline settings in a docker-compose.yml file.

services:
privacy-purger-ner:
image: consol.de/privacy-purger-ner:1.2.1
ports:
- "5000:5000"
environment:
- HF_HUB_OFFLINE=1
- TRANSFORMERS_OFFLINE=1

Running as a local Python script

If you are running the Privacy Purger directly as a Python script (e.g., python NerService.py), you need to follow a specific sequence:

First run requires internet

The very first start of the script requires an active internet connection to download all necessary resources and models. During this initial run, do not set the offline variables, as they will block the required connection.

Recommended workflow:

  1. Run the script once with an internet connection to let it download everything.
  2. Once the first run is successful, stop the script.
  3. Set the environment variables HF_HUB_OFFLINE=1 and TRANSFORMERS_OFFLINE=1.
  4. Restart the script; it will now work in full offline mode.