ETL
Installing Pentaho Data Integration
The third-party client application Pentaho Data Integration (PDI) is needed to create transformations and jobs.
You need to install the client application Pentaho Data Integration - Community Edition, version 9.4, and the ConSol CM ETL package. The version of the ETL package must match the version of the ConSol CM server.
-
Download the PDI installation package from https://www.hitachivantara.com/en-us/products/pentaho-platform/data-integration-analytics/pentaho-community-edition.html
-
Unpack it to the desired location on your local machine.
-
Obtain the ZIP file with the ConSol CM ETL package
(etl-package-distribution-<CM_VERSION>-kettle.zip)
from the ConSol CM support. It contains the plug-ins, samples and some additional libraries which are needed. -
Unpack it to the
data-integration
directory of PDI (called<PDI_HOME>
in this manual). Overwrite existing files.warningWhen updating ConSol CM to a newer version, you need to update the ETL package as well. Overwrite the existing files and check the
libext
andplugins
directories to ensure that there are no duplicate libraries. At least, you will need to remove the etl-specific JAR files of the lower ConSol CM version. -
Configure Spoon, the user interface for creating transformations and jobs. This is done in the
spoon.bat
file for Windows and in thespoon.sh
file for Unix. Provide the URL to the ETL service of the ConSol CM server, and the name and the password of the administrator user in theCM_INIT
variable which is added to the runtime options(OPT)
:- Windows
- Linux
REM ******************************************************************
REM ** Set java runtime options **
REM ** Change 2048m to higher values in case you run out of memory **
REM ** or set the PENTAHO_DI_JAVA_OPTIONS environment variable **
REM ******************************************************************
set CM_INIT=-Durl=http://localhost:8888/etl-service -DcmUser=admin -DcmPassword=consol -DbatchSize=100 -DinfoSize=100 -DcountRemote=10 -DexportSize=1000
if "%PENTAHO_DI_JAVA_OPTIONS%"=="" set PENTAHO_DI_JAVA_OPTIONS="-Xms1024m" "-Xmx2048m"
set OPT=%OPT% %PENTAHO_DI_JAVA_OPTIONS% "-Djava.library.path=%LIBSPATH%;%HADOOP_HOME%/bin" %JAVA_ENDORSED_DIRS% %JAVA_LOCALE_COMPAT% "-DKETTLE_HOME=%KETTLE_HOME%" "-DKETTLE_REPOSITORY=%KETTLE_REPOSITORY%" "-DKETTLE_USER=%KETTLE_USER%" "-DKETTLE_PASSWORD=%KETTLE_PASSWORD%" "-DKETTLE_PLUGIN_PACKAGES=%KETTLE_PLUGIN_PACKAGES%" "-DKETTLE_LOG_SIZE_LIMIT=%KETTLE_LOG_SIZE_LIMIT%" "-DKETTLE_JNDI_ROOT=%KETTLE_JNDI_ROOT%" %CM_INIT%# ******************************************************************
# ** Set java runtime options **
# ** Change 2048m to higher values in case you run out of memory **
# ** or set the PENTAHO_DI_JAVA_OPTIONS environment variable **
# ******************************************************************
CM_INIT="-Durl=http://localhost:8888/etl-service -DcmUser=admin -DcmPassword=consol -DbatchSize=100 -DinfoSize=100 -DcountRemote=10 -DexportSize=1000"
OPT="$OPT $PENTAHO_DI_JAVA_OPTIONS -Djava.library.path=$LIBPATH $JAVA_ENDORSED_DIRS $JAVA_LOCALE_COMPAT -DKETTLE_HOME=$KETTLE_HOME -DKETTLE_REPOSITORY=$KETTLE_REPOSITORY -DKETTLE_USER=$KETTLE_USER -DKETTLE_PASSWORD=$KETTLE_PASSWORD -DKETTLE_PLUGIN_PACKAGES=$KETTLE_PLUGIN_PACKAGES -DKETTLE_LOG_SIZE_LIMIT=$KETTLE_LOG_SIZE_LIMIT -DKETTLE_JNDI_ROOT=$KETTLE_JNDI_ROOT" $CM_INIT" -
Start Spoon by executing the
spoon.bat
/spoon.sh
file.infoYou can directly start by creating your own transformations and jobs, or have a look at the ConSol CM sample transformations, which are located in
<PDI_HOME>/samples/consol
. If you want to run the samples, you need a ConSol CM system where the Test and demo scene is installed. This is because the sample transformations are based on actual configurations and require certain data objects and data fields to be present.
Installing ETL Runner
ETL Runner is a ConSol CM component which is needed to run transformations and jobs.
There are two options for installing ETL Runner:
- Standalone mode: Execute ETL Runner as a standalone Java application on the ConSol CM server machine or another server machine.
- Overlay mode: Deploy ETL Runner in the same application server as ConSol CM.
- Standalone
- Overlay
-
Save the
cm-etl-runner-standalone<CM_VERSION>.jar
and theetlRunnerApplication.properties
file in the directory which should be used as ETL home. -
Provide the path to the ETL workspace, the URL of the ETL service of the ConSol CM server, the application secret, and the name and the password of the administrator user in the
etlRunnerApplication.properties
file. Example:# indent all json to help debugging
application.indent.json.output=true
# Workspace directory functionality is optional and described later
application.workspace.directory=D:\ETL\workspace
# Workspace library functionality is optional and described later
application.workspace.directory=D:\ETL\workspace\drivers
# Secret used to sign JSON Web Token (JWT) to authenticate within etl-runner (minimum 32 characters)
application.secret=secret.secret.secret.secret.secret
# Temporary directory where uploaded files are stored (will use servlet container or JVM one if not set).
application.upload.temp.directory=/path
# Property names from CM kettle plugins (connection to CM instance)
url=http://localhost:8888/etl-service
cmUser=admin
cmPassword=consolIf HTTPS is used, some additional settings are required:
server.port=9443
server.ssl.key-store=/pathToYourP12/yourP12Name.p12
server.ssl.key-store-password=yourP12Password
server.ssl.keyStoreType=PKCS12
server.ssl.keyAlias=p12AliasIf a proxy is used, the following additional settings are required:
server.port=8080
server.address=127.0.0.1
server.use-forward-headers=true -
Start ETL Runner by executing the following command:
java -jar cm-etl-runner-standalone-${version}.jar
-
Save the
etlRunnerApplication.properties
to<JBOSS_HOME>/bin
. -
Provide the path to the ETL workspace, the URL of the ETL service of the ConSol CM server, and the name and the password of the administrator user in the
etlRunnerApplication.properties
file. Example:# indent all json to help debugging
application.indent.json.output=true
# Workspace directory functionality is optional and described later
application.workspace.directory=D:\ETL\workspace
# property names from CM kettle plugins (connection to CM instance)
url=http://localhost:8888/etl-service
cmUser=admin
cmPassword=consolIf HTTPS is used, some additional settings are required:
server.port: 9443
server.ssl.key-store: /pathToYourP12/yourP12Name.p12
server.ssl.key-store-password: yourP12Password
server.ssl.keyStoreType: PKCS12
server.ssl.keyAlias: p12AliasIf a proxy is used, the following additional settings are required:
server.port: 8080
server.address: 127.0.0.1
server.use-forward-headers=true -
Save the
cm-etl-runner-<CM_VERSION>.war
to<JBOSS_HOME>/standalone/deployments
.