Installing ETL components
Installing Pentaho Data Integration
The third-party client application Pentaho Data Integration (PDI) is needed to create transformations and jobs.
You need to install the client application Pentaho Data Integration - Community Edition, version 9.3, and the ConSol CM ETL package. The version of the ETL package must match the version of the ConSol CM server.
-
Download the PDI installation package from https://www.hitachivantara.com/en-us/products/pentaho-platform/data-integration-analytics/pentaho-community-edition.html
-
Unpack it to the desired location on your local machine.
-
Obtain the ZIP file with the ConSol CM ETL package (etl-package-distribution-<CM_VERSION>-kettle.zip) from the ConSol CM support. It contains the plug-ins, samples and some additional libraries which are needed.
-
Unpack it to the data-integration directory of PDI (called <PDI_HOME> in this manual). Overwrite existing files.
When updating ConSol CM to a newer version, you need to update the ETL package as well. Overwrite the existing files and check the libext and plugins directories to ensure that there are no duplicate libraries. At least, you will need to remove the etl-specific JAR files of the lower ConSol CM version.
-
Configure Spoon, the user interface for creating transformations and jobs. This is done in the spoon.bat file for Windows and in the spoon.sh file for Unix. Provide the URL to the ETL service of the ConSol CM server, and the name and the password of the administrator user in the CM_INIT variable which is added to the runtime options (OPT):
Example for Windows (ConSol CM-specific changes highlighted in red):
REM ******************************************************************
REM ** Set java runtime options **
REM ** Change 2048m to higher values in case you run out of memory **
REM ** or set the PENTAHO_DI_JAVA_OPTIONS environment variable **
REM ******************************************************************
set CM_INIT=-Durl=http://localhost:8888/etl-service -DcmUser=admin -DcmPassword=consol -DbatchSize=100 -DinfoSize=100 -DcountRemote=10 -DexportSize=1000
if "%PENTAHO_DI_JAVA_OPTIONS%"=="" set PENTAHO_DI_JAVA_OPTIONS="-Xms1024m" "-Xmx2048m"
set OPT=%OPT% %PENTAHO_DI_JAVA_OPTIONS% "-Djava.library.path=%LIBSPATH%;%HADOOP_HOME%/bin" %JAVA_ENDORSED_DIRS% %JAVA_LOCALE_COMPAT% "-DKETTLE_HOME=%KETTLE_HOME%" "-DKETTLE_REPOSITORY=%KETTLE_REPOSITORY%" "-DKETTLE_USER=%KETTLE_USER%" "-DKETTLE_PASSWORD=%KETTLE_PASSWORD%" "-DKETTLE_PLUGIN_PACKAGES=%KETTLE_PLUGIN_PACKAGES%" "-DKETTLE_LOG_SIZE_LIMIT=%KETTLE_LOG_SIZE_LIMIT%" "-DKETTLE_JNDI_ROOT=%KETTLE_JNDI_ROOT%" %CM_INIT%
Example for Unix (ConSol CM-specific changes highlighted in red):
# ******************************************************************
# ** Set java runtime options **
# ** Change 2048m to higher values in case you run out of memory **
# ** or set the PENTAHO_DI_JAVA_OPTIONS environment variable **
# ******************************************************************
CM_INIT="-Durl=http://localhost:8888/etl-service -DcmUser=admin -DcmPassword=consol -DbatchSize=100 -DinfoSize=100 -DcountRemote=10 -DexportSize=1000"
OPT="$OPT $PENTAHO_DI_JAVA_OPTIONS -Djava.library.path=$LIBPATH $JAVA_ENDORSED_DIRS $JAVA_LOCALE_COMPAT -DKETTLE_HOME=$KETTLE_HOME -DKETTLE_REPOSITORY=$KETTLE_REPOSITORY -DKETTLE_USER=$KETTLE_USER -DKETTLE_PASSWORD=$KETTLE_PASSWORD -DKETTLE_PLUGIN_PACKAGES=$KETTLE_PLUGIN_PACKAGES -DKETTLE_LOG_SIZE_LIMIT=$KETTLE_LOG_SIZE_LIMIT -DKETTLE_JNDI_ROOT=$KETTLE_JNDI_ROOT" $CM_INIT"
-
Start Spoon by executing the spoon.bat / spoon.sh file.
You can directly start by creating your own transformations and jobs, or have a look at the ConSol CM sample transformations, which are located in <PDI_HOME>/samples/consol. If you want to run the samples, you need a ConSol CM system where the Test and demo scene is installed. This is because the sample transformations are based on actual configurations and require certain data objects and data fields to be present.
Installing ETL Runner
ETL Runner is a ConSol CM component which is needed to run transformations and jobs.
There are two options for installing ETL Runner:
-
Standalone mode: Execute ETL Runner as a standalone Java application on the ConSol CM server machine or another server machine.
-
Overlay mode: Deploy ETL Runner in the same application server as ConSol CM.
Standalone mode
-
Save the etl-runner-<CM_VERSION>.jar and the etlRunnerApplication.properties file in the directory which should be used as ETL home.
-
Provide the path to the ETL workspace, the URL of the ETL service of the ConSol CM server, the application secret, and the name and the password of the administrator user in the etlRunnerApplication.properties file.
Example:
# indent all json to help debugging
application.indent.json.output=true
# Workspace directory functionality is optional and described later
application.workspace.directory=D:\ETL\workspace
# Secret used to sign JSON Web Token (JWT) to authenticate within etl-runner (minimum 32 characters)
application.secret=secret.secret.secret.secret.secret
# property names from CM kettle plugins (connection to CM instance)
url=http://localhost:8888/etl-service
cmUser=admin
cmPassword=consol
If HTTPS is used, some additional settings are required:
server.port=9443
server.ssl.key-store=/pathToYourP12/yourP12Name.p12
server.ssl.key-store-password=yourP12Password
server.ssl.keyStoreType=PKCS12
server.ssl.keyAlias=p12Alias
If a proxy is used, the following additional settings are required:
server.port=8080
server.address=127.0.0.1
server.use-forward-headers=true
-
Start ETL Runner by executing the following command:
java -jar modules/application/package/app/target/etl-runner-${version}.jar
Overlay mode
-
Save the etlRunnerApplication.properties to <JBOSS_HOME>/bin.
-
Provide the path to the ETL workspace, the URL of the ETL service of the ConSol CM server, the application secret, and the name and the password of the administrator user in the etlRunnerApplication.properties file.
Example:
# indent all json to help debugging
application.indent.json.output=true
# Workspace directory functionality is optional and described later
application.workspace.directory=D:\ETL\workspace
# Secret used to sign JSON Web Token (JWT) to authenticate within etl-runner (minimum 32 characters)
application.secret=secret.secret.secret.secret.secret
# property names from CM kettle plugins (connection to CM instance)
url=http://localhost:8888/etl-service
cmUser=admin
cmPassword=consol
If HTTPS is used, some additional settings are required:
server.port=9443
server.ssl.key-store=/pathToYourP12/yourP12Name.p12
server.ssl.key-store-password=yourP12Password
server.ssl.keyStoreType=PKCS12
server.ssl.keyAlias=p12Alias
If a proxy is used, the following additional settings are required:
server.port=8080
server.address=127.0.0.1
server.use-forward-headers=true
-
Save the etl-runner-<CM_VERSION>.war to <JBOSS_HOME>/standalone/deployments.