Zum Hauptinhalt springen

ETL

Installing Pentaho Data Integration

The third-party client application Pentaho Data Integration (PDI) is needed to create transformations and jobs.

You need to install the client application Pentaho Data Integration - Community Edition, version 9.4, and the ConSol CM ETL package. The version of the ETL package must match the version of the ConSol CM server.

  1. Download the PDI installation package from https://www.hitachivantara.com/en-us/products/pentaho-platform/data-integration-analytics/pentaho-community-edition.html

  2. Unpack it to the desired location on your local machine.

  3. Obtain the ZIP file with the ConSol CM ETL package (etl-package-distribution-<CM_VERSION>-kettle.zip) from the ConSol CM support. It contains the plug-ins, samples and some additional libraries which are needed.

  4. Unpack it to the data-integration directory of PDI (called <PDI_HOME> in this manual). Overwrite existing files.

    warnung

    When updating ConSol CM to a newer version, you need to update the ETL package as well. Overwrite the existing files and check the libext and plugins directories to ensure that there are no duplicate libraries. At least, you will need to remove the etl-specific JAR files of the lower ConSol CM version.

  5. Configure Spoon, the user interface for creating transformations and jobs. This is done in the spoon.bat file for Windows and in the spoon.sh file for Unix. Provide the URL to the ETL service of the ConSol CM server, and the name and the password of the administrator user in the CM_INIT variable which is added to the runtime options (OPT):

    REM ******************************************************************
    REM ** Set java runtime options **
    REM ** Change 2048m to higher values in case you run out of memory **
    REM ** or set the PENTAHO_DI_JAVA_OPTIONS environment variable **
    REM ******************************************************************

    set CM_INIT=-Durl=http://localhost:8888/etl-service -DcmUser=admin -DcmPassword=consol -DbatchSize=100 -DinfoSize=100 -DcountRemote=10 -DexportSize=1000

    if "%PENTAHO_DI_JAVA_OPTIONS%"=="" set PENTAHO_DI_JAVA_OPTIONS="-Xms1024m" "-Xmx2048m"

    set OPT=%OPT% %PENTAHO_DI_JAVA_OPTIONS% "-Djava.library.path=%LIBSPATH%;%HADOOP_HOME%/bin" %JAVA_ENDORSED_DIRS% %JAVA_LOCALE_COMPAT% "-DKETTLE_HOME=%KETTLE_HOME%" "-DKETTLE_REPOSITORY=%KETTLE_REPOSITORY%" "-DKETTLE_USER=%KETTLE_USER%" "-DKETTLE_PASSWORD=%KETTLE_PASSWORD%" "-DKETTLE_PLUGIN_PACKAGES=%KETTLE_PLUGIN_PACKAGES%" "-DKETTLE_LOG_SIZE_LIMIT=%KETTLE_LOG_SIZE_LIMIT%" "-DKETTLE_JNDI_ROOT=%KETTLE_JNDI_ROOT%" %CM_INIT%
  6. Start Spoon by executing the spoon.bat / spoon.sh file.

    info

    You can directly start by creating your own transformations and jobs, or have a look at the ConSol CM sample transformations, which are located in <PDI_HOME>/samples/consol. If you want to run the samples, you need a ConSol CM system where the Test and demo scene is installed. This is because the sample transformations are based on actual configurations and require certain data objects and data fields to be present.

Installing ETL Runner

ETL Runner is a ConSol CM component which is needed to run transformations and jobs.

There are two options for installing ETL Runner:

  • Standalone mode: Execute ETL Runner as a standalone Java application on the ConSol CM server machine or another server machine.
  • Overlay mode: Deploy ETL Runner in the same application server as ConSol CM.
  1. Save the cm-etl-runner-standalone<CM_VERSION>.jar and the etlRunnerApplication.properties file in the directory which should be used as ETL home.

  2. Provide the path to the ETL workspace, the URL of the ETL service of the ConSol CM server, the application secret, and the name and the password of the administrator user in the etlRunnerApplication.properties file. Example:

    # indent all json to help debugging
    application.indent.json.output=true

    # Workspace directory functionality is optional and described later
    application.workspace.directory=D:\ETL\workspace

    # Workspace library functionality is optional and described later
    application.workspace.directory=D:\ETL\workspace\drivers

    # Secret used to sign JSON Web Token (JWT) to authenticate within etl-runner (minimum 32 characters)
    application.secret=secret.secret.secret.secret.secret

    # Temporary directory where uploaded files are stored (will use servlet container or JVM one if not set).
    application.upload.temp.directory=/path

    # Property names from CM kettle plugins (connection to CM instance)
    url=http://localhost:8888/etl-service
    cmUser=admin
    cmPassword=consol

    If HTTPS is used, some additional settings are required:

    server.port=9443
    server.ssl.key-store=/pathToYourP12/yourP12Name.p12
    server.ssl.key-store-password=yourP12Password
    server.ssl.keyStoreType=PKCS12
    server.ssl.keyAlias=p12Alias

    If a proxy is used, the following additional settings are required:

    server.port=8080
    server.address=127.0.0.1
    server.use-forward-headers=true
  3. Start ETL Runner by executing the following command:

    java -jar cm-etl-runner-standalone-${version}.jar