Skip to main content

ETL

Introduction to ETL in ConSol CM

ETL is used for processing large amounts of runtime data. It allows to import cases, contacts, resources or users from external data sources to ConSol CM, and to export cases, contacts, resources or users from ConSol CM to external data sources. This is usually required for integrating ConSol CM into the IT infrastructure of the customer and exchanging data with other IT systems.

  • ETL (Extract Transform Load): Process to extract data from a data source, transform it and import it to another data source
  • Transformation: Chain of processing steps which define a data flow, usually consists of one or several input steps, several transformation steps and several output steps; file extension ktr
  • Job: Workflow for executing one or several transformations, usually handles file availability, database connections and error notifications; file extension kjb
  • Plug-in: Processing step with settings used in transformations or jobs
  • Task: Execution configuration of a transformation or job in ConSol CM

Purpose and usage

ETL allows you to process large amounts of runtime data:

  • Import case, contact, resource or user data from an external data source to ConSol CM.
  • Export case, contact, resource or user data from ConSol CM into external data sources.
  • Migrate cases, contacts, resources or users from a ConSol CM system to another ConSol CM system.
info

Importing and exporting configuration data is done on the Staging export, Custom export and Import pages of the Web Admin Suite.

The required processing steps are defined in transformations and jobs which are referenced in a task. This task is executed according to a defined schedule.

Scope of the applications

Working with ETL in ConSol CM involves several applications. The following graphic illustrates their scope:

ETL_EN.png

  • Pentaho Data Integration: Third-party desktop application used to create transformations and jobs. ConSol CM provides a number of own plug-ins which are combined with the default plug-ins provided by PDI to create transformations.
  • ETL Runner: Console application used to execute tasks. Can be deployed in the application server of ConSol CM or can be executed as a standalone Java application.
  • Web Admin Suite: Web application used to create tasks based on transformations and jobs, schedule tasks for execution, track task execution and manage files used in transformations.

Available plug-ins

The ConSol CM ETL plug-ins can be used in transformations together with the plug-ins provided directly by PDI. They are grouped according to their purpose in the input, output and lookup folders.

By default, commonly used PDI plugins are available in ETL Runner, see Adding additional PDI plugins for adding additional plugins.

Input steps

Input steps export data from the ConSol CM database

  • Case input: Export cases from ConSol CM
  • Contact input: Export contacts from ConSol CM
  • Resource input: Export resources from ConSol CM
  • User input: Export users from ConSol CM
  • Role input: Export roles from ConSol CM

Output steps

Output steps import data to the ConSol CM database.

  • Case output: Import cases to ConSol CM
  • Case container field output: Import case fields of the types List or Columns to ConSol CM
  • Case relation output: Import relations between cases to ConSol CM
  • Case-contact relation output: Import relations between cases and contacts to ConSol CM
  • Case content output: Import comments and attachments for cases to ConSol CM
  • Contact output: Import contacts to ConSol CM
  • Contact container field output: Import contact fields of the types List or Columns to ConSol CM
  • Contact content output: Import comments and attachments for contacts to ConSol CM
  • Resource output: Import resources from another system to ConSol CM
  • Resource container field output: Import resource fields of the types List or Columns to ConSol CM
  • Resource content output: Import comments and attachments for resources to ConSol CM
  • Resource relation output: Import relations between resources to ConSol CM
  • User output: Import users to ConSol CM

Lookup steps

Lookup steps check data in the ConSol CM database.

  • Contact ID lookup: Find contacts in the ConSol CM database
  • Resource ID lookup: Find resources in the ConSol CM database

Task overview

You need to perform the following tasks to use ETL in ConSol CM:

  1. Install PDI on the local development machine, see Installing Pentaho Data Integration.
  2. Install ETL Runner on the server machine, see Installing ETL Runner.
  3. Create transformations and jobs in PDI. See the official PDI documentation for the general hints and information about the default plug-ins. The ConSol CM plug-ins are described in Available plug-ins.
  4. Upload the transformations and jobs to the workspace on the ETL Runner machine, see Uploading transformations and jobs.
  5. Create tasks to run the transformations and jobs, see Creating tasks.
  6. Upload the files which are needed to run the transformations or jobs, see Managing files.
  7. Track the task execution, see Tracking task execution.

If your transformations or jobs involve connections to external databases, you need to sure that the connections are defined and the drivers accessible to ETL Runner, see Connecting to external databases.

Basic tasks

Configuring the connection to ETL Runner

In order to use the Web Admin Suite for working with ETL, you need to configure the connection to the ETL Runner instance which should execute the tasks.

  1. Go to the Connection tab.

  2. Enter the URL of ETL Runner. The URL depends on the deployment mode. Examples:

  3. Enter the secret of ETL Runner as defined in the setting application.secret in the etlRunnerApplication.properties file.

  4. You can click the Check connection button to check if the URL is correct.

  5. Click the Update connection button to save the connection.

The Connection tab also shows the path to the ETL workspace directory, which is defined during the installation of ETL Runner, see Installing ETL Runner.

Uploading transformations and jobs

You can use the Web Admin Suite to upload the transformations and jobs which you created in PDI to the ETL workspace. If your transformations or jobs require other files, as for example input files which contain data to be imported, you can upload them to the workspace as well.

  1. Go to the Files tab.
  2. Navigate to the desired location. You can double-click an existing directory to open it. In addition, you can create a new folder by clicking the New directory button. The directory structure can be defined as needed. You can for example create one folder for each transformation which in turn contains sub-folders for input, output and log files.
  3. Click the Upload file button. If you upload a ZIP file, it is automatically unpacked in the target location.

Alternatively, you can use other mechanisms to copy the files to the desired location, e.g. the file explorer or FTP.

info

It is possible to reference files which are saved outside of the workspace in your transformations. Nevertheless, these files cannot be managed on the Files tab.

Creating tasks

A task is a JSON file which includes the necessary settings for executing a transformation or job. The following settings are available:

  • Task name: Enter a name for the task. It will be used in the name of the JSON file.
  • Path to the transformation or job file: Either select an .ktr or .kjb file which are saved in the workspace and suggested automatically or enter the path to the file manually. This can be a relative path in the workspace, or an absolute path on the machine where ETL Runner is executed. You can upload a new file or check if the file exists using the icons on the left.
  • Schedule: Provide the execution schedule for the task. There are three options:
    • Manual: The task can be started manually by clicking the start icon in the task details.
    • Once: The task will run once at the provided execution date and time.
    • Periodical: The task will run periodically according to the provided cron expression. You can either enter a complete expression or edit the parts of the expression separately. Click Show examples to display examples, which you can select for further editing. Click Next executions to see when the task will be executed according to the defined schedule.
  • Log level: Select how much logging of the task execution is needed.
  • Path to the log files: Enter the path where the log file should be saved. This can be a relative path in the workspace, or an absolute path on the machine where ETL Runner is executed.
  • Path to the JNDI properties file: Enter the path to the JNDI properties file, which is needed if your transformation or job connects to external databases. This can be a relative path in the workspace, or an absolute path on the machine where ETL Runner is executed.
  • Safe mode: Select this option to ensure that the layout of all rows is identical. If a row does not have the same layout as the first row, an error is generated and reported.
  • Gather metrics: Select this option to gather general metrics to monitor the performance of the execution.
  • Gather step performance: Select this option to gather metrics about the step performance to monitor the performance of the execution.
  • Parameters: You can create parameters with the desired values to store them as local variables for the transformation. This is useful, for example, if your transformation writes data into output files, and you do not want to hard-code the file path inside the transformation.
  • Variables: You can create variables with the desired values to store them as user-defined and environment variables. The values of the variables Workspace.Directory, cmUser and cmPassword are read-only because they are defined in the etlRunnerApplication.properties file, see Installing ETL Runner
Using cron expressions

Task scheduling is done using cron expressions. A cron expression is a string which consists of six fields:

  • Second (not supported)
  • Minute: 0 - 59
  • Hour: 0 - 23
  • Day of the month: 1 - 31 or L (last day of the month), a W following a number or L means the weekday
  • Month: 1 - 12 or JAN - DEC
  • Day of the week: 0 - 7 or MON - SUN

The following operators can be used in the expression:

  • * (asterisk): every, i.e. first to last, e.g. * in the day of month field means every day
  • - (hyphen): inclusive range, e.g. MON-FRI means Monday to Friday
  • , (comma): and, e.g. SAT,SUN means Saturday and Sunday
  • / (slash): interval, e.g. */4 in the hour field means every four hours

The following examples show valid cron expressions:

  • Daily execution at 23:30: 30 23 * * *
  • Daily execution at 23:00 on weekdays: 0 23 * * MON-FRI
  • Daily execution every four hours: 0 */4 * * *
  • Daily execution every hour from 08:00 till 17:00: 0 8-17 * * *
  • Weekly execution at 01:10 on Sundays: 10 1 * * SUN
  • Monthly execution at 01:00 on the first day of the month: 0 1 1 * *
  • Monthly execution at 23:30 on the last day of the month: 30 23 L * *
  • Execution at 23:30 on the last weekday of the quarter: 30 23 LW MAR,JUN,SEP,DEC *

See https://docs.spring.io/spring-framework/docs/current/javadoc-api/org/springframework/scheduling/support/CronExpression.html for the official documentation.

Managing files

The Files tab shows the content of the workspace on the ETL Runner machine. You can place files needed in your transformations or jobs in the workspace, e.g. input files which are imported. The following actions are available:

  • Find files or folders: Navigate through the directory by clicking the folders in the table or clicking the links in the Current directory path shown above the table. Type the name of a file or folder in the search field to filter the table accordingly. If matching files or folders are found in other subdirectories, links to open them are shown above the table.
  • Edit files: Edit text files (extensions .csv, .json, .kjb, .ktr, .log, .properties, .tsv, .txt and .xml) directly by double-clicking their name. This opens a modal window with a text editor. Only files which are smaller than 64 KB can be edited.
  • Rename files or folders: Click the Rename icon in a row to rename the respective file or folder.
  • Download files or folders: Download specific files or folders by clicking the Download icon in their row. The Download all button allows you to download everything, i.e. the whole workspace.
  • Upload files: Upload a file to the current location by clicking the Upload file button. If you upload a zip file, it is automatically unpacked in the target location.
  • Create a file: Click the New file button and enter a name with extension to create a new file in the current location.
  • Create a directory: Click the New directory button and enter a name to create a new folder in the current location.
  • Delete files or folders: Click the Delete icon in a row to remove the respective file or folder from the workspace.
note

Use the Refresh icon next to the path to refresh the workspace, e.g. if files are changed directly on the file system.

Tracking task execution

The Tasks table shows a list of your tasks with their current status and next execution. You can check the Execution details tab in the task details to obtain additional information:

  • Schedule: Shows the configuration of the next planned execution.
  • Configuration: Shows the configuration of the current / last execution.
  • Metrics: Shows the metrics of the current / last execution. Only if Gather metrics is selected in the task configuration.
  • Performance: Shows the performance of the current / last execution. Only if Gather step performance is selected in the task configuration.
  • Results: Shows the results of the last execution.
  • Logs: Shows the log messages of the current / last execution according to the log level which is selected in the task configuration.

All the data is shown in its raw JSON or text format.

Advanced tasks

Connecting to external databases

You can connect to external databases in your transformations and jobs. Please proceed as follows to enable such connections:

  1. Define the database connections in Pentaho Data Integration.
  2. Save the jdbc.properties file from the simple-jndi folder of Pentaho Data Integration to the ETL workspace.
  3. Save the required drivers in the path configured in application.libs.directory of the etlRunnerApplication.properties file.
  4. Add entries with the drivers' names and checksums to the etlRunnerApplication.properties file. Syntax: application.libs.checksums.{file_name}={sha1_checksum}.
info

Entries for the drivers of the databases supported by ConSol CM are present in the default configuration. ETL Runner must be able to access the database URL at runtime.

Adding additional PDI plugins

If your transformations and jobs include PDI plugins which are not part of the set of frequently used default plugins, you need to add these plugins manually in order to execute these transformations and jobs. Please proceed as follows to enable additional plugins:

  1. Create a plugin directory and configure its path in the etlRunnerApplication.properties file by adding the following line:
application.plugins.directory=/etl-runner/plugins
  1. If ETL Runner is deployed on the application server, add the following line to the application server's start command:
--add-opens java.base/sun.net.www.protocol.jar=ALL-UNNAMED
  1. Copy the desired plugin from the Pentaho Data Integration distribution (e.g., ${pentaho}/data-integration/plugins/${plugin}) into the /etl-runner/plugins/${plugin} directory.
  2. Restart ETL Runner to apply the changes.
info

Some plugins might require additional dependencies to be copied into the /etl-runner/plugins/${plugin}/lib directory.