Installing ETL components

Installing Pentaho Data Integration

The third-party client application Pentaho Data Integration (PDI) is needed to create transformations and jobs.

You need to install the client application Pentaho Data Integration - Community Edition, version 9.3, and the ConSol CM ETL package. The version of the ETL package must match the version of the ConSol CM server.

  1. Download the PDI installation package from https://www.hitachivantara.com/en-us/products/pentaho-platform/data-integration-analytics/pentaho-community-edition.html

  2. Unpack it to the desired location on your local machine.

  3. Obtain the ZIP file with the ConSol CM ETL package (etl-package-distribution-<CM_VERSION>-kettle.zip) from the ConSol CM support. It contains the plug-ins, samples and some additional libraries which are needed.

  4. Unpack it to the data-integration directory of PDI (called <PDI_HOME> in this manual). Overwrite existing files.

    When updating ConSol CM to a newer version, you need to update the ETL package as well. Overwrite the existing files and check the libext and plugins directories to ensure that there are no duplicate libraries. At least, you will need to remove the etl-specific JAR files of the lower ConSol CM version.

  5. Configure Spoon, the user interface for creating transformations and jobs. This is done in the spoon.bat file for Windows and in the spoon.sh file for Unix. Provide the URL to the ETL service of the ConSol CM server, and the name and the password of the administrator user in the CM_INIT variable which is added to the runtime options (OPT):

    Example for Windows (ConSol CM-specific changes highlighted in red):

    REM ******************************************************************

    REM ** Set java runtime options **

    REM ** Change 2048m to higher values in case you run out of memory **

    REM ** or set the PENTAHO_DI_JAVA_OPTIONS environment variable **

    REM ******************************************************************

     

    set CM_INIT=-Durl=http://localhost:8888/etl-service -DcmUser=admin -DcmPassword=consol -DbatchSize=100 -DinfoSize=100 -DcountRemote=10 -DexportSize=1000

     

    if "%PENTAHO_DI_JAVA_OPTIONS%"=="" set PENTAHO_DI_JAVA_OPTIONS="-Xms1024m" "-Xmx2048m"

     

    set OPT=%OPT% %PENTAHO_DI_JAVA_OPTIONS% "-Djava.library.path=%LIBSPATH%;%HADOOP_HOME%/bin" %JAVA_ENDORSED_DIRS% %JAVA_LOCALE_COMPAT% "-DKETTLE_HOME=%KETTLE_HOME%" "-DKETTLE_REPOSITORY=%KETTLE_REPOSITORY%" "-DKETTLE_USER=%KETTLE_USER%" "-DKETTLE_PASSWORD=%KETTLE_PASSWORD%" "-DKETTLE_PLUGIN_PACKAGES=%KETTLE_PLUGIN_PACKAGES%" "-DKETTLE_LOG_SIZE_LIMIT=%KETTLE_LOG_SIZE_LIMIT%" "-DKETTLE_JNDI_ROOT=%KETTLE_JNDI_ROOT%" %CM_INIT%

    Example for Unix (ConSol CM-specific changes highlighted in red):

    # ******************************************************************

    # ** Set java runtime options **

    # ** Change 2048m to higher values in case you run out of memory **

    # ** or set the PENTAHO_DI_JAVA_OPTIONS environment variable **

    # ******************************************************************

     

    CM_INIT="-Durl=http://localhost:8888/etl-service -DcmUser=admin -DcmPassword=consol -DbatchSize=100 -DinfoSize=100 -DcountRemote=10 -DexportSize=1000"

     

    OPT="$OPT $PENTAHO_DI_JAVA_OPTIONS -Djava.library.path=$LIBPATH $JAVA_ENDORSED_DIRS $JAVA_LOCALE_COMPAT -DKETTLE_HOME=$KETTLE_HOME -DKETTLE_REPOSITORY=$KETTLE_REPOSITORY -DKETTLE_USER=$KETTLE_USER -DKETTLE_PASSWORD=$KETTLE_PASSWORD -DKETTLE_PLUGIN_PACKAGES=$KETTLE_PLUGIN_PACKAGES -DKETTLE_LOG_SIZE_LIMIT=$KETTLE_LOG_SIZE_LIMIT -DKETTLE_JNDI_ROOT=$KETTLE_JNDI_ROOT" $CM_INIT"

  6. Start Spoon by executing the spoon.bat / spoon.sh file.

    You can directly start by creating your own transformations and jobs, or have a look at the ConSol CM sample transformations, which are located in <PDI_HOME>/samples/consol. If you want to run the samples, you need a ConSol CM system where the Test and demo scene is installed. This is because the sample transformations are based on actual configurations and require certain data objects and data fields to be present.

Installing ETL Runner

ETL Runner is a ConSol CM component which is needed to run transformations and jobs.

There are two options for installing ETL Runner:

Standalone mode

  1. Save the etl-runner-<CM_VERSION>.jar and the etlRunnerApplication.properties file in the directory which should be used as ETL home.

  2. Provide the path to the ETL workspace, the URL of the ETL service of the ConSol CM server, the application secret, and the name and the password of the administrator user in the etlRunnerApplication.properties file.

    Example:

    # indent all json to help debugging

    application.indent.json.output=true

     

    # Workspace directory functionality is optional and described later

    application.workspace.directory=D:\ETL\workspace

     

    # Secret used to sign JSON Web Token (JWT) to authenticate within etl-runner (minimum 32 characters)

    application.secret=secret.secret.secret.secret.secret

     

    # property names from CM kettle plugins (connection to CM instance)

    url=http://localhost:8888/etl-service

    cmUser=admin

    cmPassword=consol

    If HTTPS is used, some additional settings are required:

    server.port=9443

    server.ssl.key-store=/pathToYourP12/yourP12Name.p12

    server.ssl.key-store-password=yourP12Password

    server.ssl.keyStoreType=PKCS12

    server.ssl.keyAlias=p12Alias

    If a proxy is used, the following additional settings are required:

    server.port=8080

    server.address=127.0.0.1

    server.use-forward-headers=true

  3. Start ETL Runner by executing the following command:

    java -jar modules/application/package/app/target/etl-runner-${version}.jar

Overlay mode

  1. Save the etlRunnerApplication.properties to <JBOSS_HOME>/bin.

  2. Provide the path to the ETL workspace, the URL of the ETL service of the ConSol CM server, the application secret, and the name and the password of the administrator user in the etlRunnerApplication.properties file.

    Example:

    # indent all json to help debugging

    application.indent.json.output=true

     

    # Workspace directory functionality is optional and described later

    application.workspace.directory=D:\ETL\workspace

     

    # Secret used to sign JSON Web Token (JWT) to authenticate within etl-runner (minimum 32 characters)

    application.secret=secret.secret.secret.secret.secret

     

    # property names from CM kettle plugins (connection to CM instance)

    url=http://localhost:8888/etl-service

    cmUser=admin

    cmPassword=consol

    If HTTPS is used, some additional settings are required:

    server.port=9443

    server.ssl.key-store=/pathToYourP12/yourP12Name.p12

    server.ssl.key-store-password=yourP12Password

    server.ssl.keyStoreType=PKCS12

    server.ssl.keyAlias=p12Alias

    If a proxy is used, the following additional settings are required:

    server.port=8080

    server.address=127.0.0.1

    server.use-forward-headers=true

  3. Save the etl-runner-<CM_VERSION>.war to <JBOSS_HOME>/standalone/deployments.