ETL

Introduction to ETL in ConSol CM

ETL is used for processing large amounts of runtime data. It allows to import cases, contacts, resources or users from external data sources to ConSol CM, and to export cases, contacts, resources or users from ConSol CM to external data sources. This is usually required for integrating ConSol CM into the IT infrastructure of the customer and exchanging data with other IT systems.

Concepts, terms and definitions

Concept

Other terms

Definition

ETL Extract Transform Load Process to extract data from a data source, transform it and import it to another data source
transformation   Chain of processing steps which define a data flow, usually consists of one or several input steps, several transformation steps and several output steps; file extension ktr
job   Workflow for executing one or several transformations, usually handles file availability, database connections and error notifications; file extension kjb

plug-in

 

Processing step with settings used in transformations or jobs

task

 

Execution configuration of a transformation or job in ConSol CM

Purpose and usage

ETL allows you to process large amounts of runtime data:

Importing and exporting configuration data is done on the Staging export, Custom export and Import pages of the Web Admin Suite.

The required processing steps are defined in transformations and jobs which are referenced in a task. This task is executed according to a defined schedule.

Scope of the applications

Working with ETL in ConSol CM involves several applications. The following graphic illustrates their scope:

Available plug-ins

The ConSol CM ETL plug-ins can be used in transformations together with the plug-ins provided directly by PDI. They are grouped according to their purpose in the input, output and lookup folders.

By default, commonly used PDI plugins are available in ETL Runner, see Adding additional PDI plugins for adding additional plugins.

Input steps

Input steps export data from the ConSol CM database

Output steps

Output steps import data to the ConSol CM database.

Lookup steps

Lookup steps check data in the ConSol CM database.

Task overview

You need to perform the following tasks to use ETL in ConSol CM:

  1. Install PDI on the local development machine, see Installing Pentaho Data Integration.

  2. Install ETL Runner on the server machine, see Installing ETL Runner.

  3. Create transformations and jobs in PDI. See the official PDI documentation for the general hints and information about the default plug-ins. The ConSol CM plug-ins are described in Available plug-ins.

  4. Upload the transformations and jobs to the workspace on the ETL Runner machine, see Uploading transformations and jobs.

  5. Create tasks to run the transformations and jobs, see Creating tasks.

  6. Upload the files which are needed to run the transformations or jobs, see Managing files.

  7. Track the task execution, see Tracking task execution.

If your transformations or jobs involve connections to external databases, you need to sure that the connections are defined and the drivers accessible to ETL Runner, see Connecting to external databases.

Basic tasks

Configuring the connection to ETL Runner

In order to use the Web Admin Suite for working with ETL, you need to configure the connection to the ETL Runner instance which should execute the tasks.

  1. Go to the Connection tab.

  2. Enter the URL of ETL Runner. The URL depends on the deployment mode. Examples:

    • Overlay deployment: http://localhost:8888/etl-runner

    • Standalone deployment: http://localhost:8080

  3. Enter the secret of ETL Runner as defined in the setting application.secret in the etlRunnerApplication.properties file.

  4. You can click the Check connection button to check if the URL is correct.

  5. Click the Update connection button to save the connection.

The Connection tab also shows the path to the ETL workspace directory, which is defined during the installation of ETL Runner, see Installing ETL Runner.

Uploading transformations and jobs

You can use the Web Admin Suite to upload the transformations and jobs which you created in PDI to the ETL workspace. If your transformations or jobs require other files, as for example input files which contain data to be imported, you can upload them to the workspace as well.

  1. Go to the Files tab.

  2. Navigate to the desired location. You can double-click an existing directory to open it. In addition, you can create a new folder by clicking the New directory button.

    The directory structure can be defined as needed. You can for example create one folder for each transformation which in turn contains sub-folders for input, output and log files.

  3. Click the Upload file button. If you upload a ZIP file, it is automatically unpacked in the target location.

Alternatively, you can use other mechanisms to copy the files to the desired location, e.g. the file explorer or FTP.

It is possible to reference files which are saved outside of the workspace in your transformations. Nevertheless, these files cannot be managed on the Files tab.

Creating tasks

A task is a JSON file which includes the necessary settings for executing a transformation or job. The following settings are available:

Managing files

The Files tab shows the content of the workspace on the ETL Runner machine. You can place files needed in your transformations or jobs in the workspace, e.g. input files which are imported. The following actions are available:

Tracking task execution

The Tasks table shows a list of your tasks with their current status and next execution. You can check the Execution details tab in the task details to obtain additional information:

All the data is shown in its raw JSON or text format.

Advanced tasks

Connecting to external databases

You can connect to external databases in your transformations and jobs. Please proceed as follows to enable such connections:

Entries for the drivers of the databases supported by ConSol CM are present in the default configuration.

ETL Runner must be able to access the database URL at runtime.

Adding additional PDI plugins

If your transformations and jobs include PDI plugins which are not part of the set of frequently used default plugins, you need to add these plugins manually in order to execute these transformations and jobs. Please proceed as follows to enable additional plugins:

Some plugins might require additional dependencies to be copied into the /etl-runner/plugins/${plugin}/lib directory.