Transformation via the command line

This help topic is about performing transformation of source data using the command line interface provided by hale»studio. It allows you to execute transformations based on mappings defined in hale»studio, without having to use hale»studio as desktop application, for instance to run the transformation automatically on a regular basis or to integrate it with existing infrastructure.

The following are the basic things you need to use the command line interface:

An existing hale project file, e.g. a project archive (.halez).
The location of the source data to transform, either an URL or a file name. The data has to match the source schema of the project.
Configuration on how and where to write the transformed data.

General usage

You can run hale»studio on the command line either using the hale»studio executable (i.e. HALE.exe) or using Java directly. Depending on the operating system, behavior may be different, but in general it is better to use Java directly (either use locally installed compatible version of Java or the version shipped with hale»studio), especially if you include the task in an automated process. The advantage of using the hale»studio executable is that it uses the Java version that is shipped with hale»studio automatically and sets some important system properties for you. The following are the commands to show the usage information of the command line interface, with the executable or via Java, assuming your working directory is the hale»studio installation folder:

Running the hale»studio executable:

> HALE -nosplash -application hale.transform

Running Java:

> java -Xmx1024m -Dcache.level1.enabled=false -Dcache.level1.size=0 -Dcache.level2.enabled=false -Dcache.level2.size=0 -jar plugins\org.eclipse.equinox.launcher_1.3.0.v20140415-2008.jar -application hale.transform

Running the command above will show the usage information of the transformation application. Please note that for the Java call, the version of the launcher JAR file may change with new hale»studio versions and you should use a path delimiter appropriate for your system.

Additionally, the Java call specifies settings and system properties for the Java VM. You cannot provide these settings to a call to the hale»studio executable - in that case you have to adapt the file HALE.ini. Following is a short description of the most important VM arguments:

-Xmx1024m

This setting specifies the maximum heap memory provided as working memory for hale»studio. If you are with very large individual instances you may need to increase the value. The example specifies a maximum heap space of 1024 MB.
-Dcache.level1.enabled=false -Dcache.level1.size=0 -Dcache.level2.enabled=false -Dcache.level2.size=0

These settings are important for the temporary database used within hale studio. If you do not use them you may run into memory issues.
-Dlog.hale.level=INFO -Dlog.root.level=WARN

These settings allow you to configure the log level for log events emitted by hale»studio. These log events may overlap with messages from the reports, but are generally independent. log.hale.level controls the log level for components of hale»studio, while log.root.level controls the log level for other code, such as third party libraries. See the logback documentation for more information on logging levels and other details about the logging configuration.
If you need hale»studio to connect to the internet via a proxy server, you need to provide that information as system properties as well. The command line interface will not read the proxy settings you configured in the hale»studio user interface.

The following system properties can be provided to configure the proxy:
- http.proxyHost - the proxy host name or IP address
- http.proxyPort - the proxy port number
- http.nonProxyHosts - hosts for which the proxy should not be used, separated by | (optional)
- http.proxyUser - user name for authentication with the proxy (optional)
- http.proxyPassword - password for authentication with the proxy (optional)
Specifying the system properties for configuration of the proxy could look like in this example:
-Dhttp.proxyHost=webcache.example.com -Dhttp.proxyPort=8080 -Dhttp.nonProxyHosts="localhost|host.example.com"

More information on proxy configuration in Java can be found here.

For simplicity, the following examples will use the HALE executable, you can substitute HALE by the call to Java as above.

Please note that every argument before -application and after the executable or JAR file is a launcher argument, every argument after hale.transform is an application argument.

Note: As an alternative to using the hale»studio application to launch the CLI, you can use the dedicated hale-cli. With hale-cli a transformation is called like this:
hale transform <arguments...>
hale-cli also offers other kinds of commands and can be extended with custom functionality via an extension point.

Note: If using the hale»studio executable on Windows you will probably want to add the -console launcher argument as well: HALE -nosplash -console -application hale.transform
Otherwise you may get no feedback from the application (if you still get no feedback, launch via Java).

Note: If using a version of hale»studio installed with the Windows installer (or generally a version of hale»studio that has no write access to its directory) it is needed to specify a data location with the additional launcher argument -data , for instance like this: HALE -nosplash -console -data "%APPDATA%\dhpanel\HALE" -application hale.transform

Configuration parameters

The following is the usage information provided by the transformation application:

HALE -nosplash -application hale.transform
     [-argsFile <file-with-arguments>]
     -project <file-or-URI-to-HALE-project>
     -source <file-or-URI-to-source-data>
         [-include <file-pattern>]
         [-exclude <file-pattern>]
         [-providerId <ID-of-source-reader>]
         [<setting>...]
     [-filter <filter-expression>]
     [-filterOn <type> <filter-expression>]
     [-excludeType <type>]
     [-exclude <filter-expression>]
     -target <target-file-or-URI>
         [-preset <name-of-export-preset>]
         [-providerId <ID-of-target-writer>]
         [<setting>...]
     [-validate <ID-of-target-validator> [<setting>...]]
     [options...]

  where setting is
     -S<setting-name> <value>
     -X<setting-name> <path-to-XML-file>

  and options are
     -reportsOut <reports-file>
     -stacktrace
     -trustGroovy
     -overallFilterContext
     -statisticsOut <statistics-file>
     -successEvaluation <file-or-URI-to-script>

hale project file

-project <file-or-URI-to-hale-project>

A hale project contains all the necessary information to perform the transformation from one data model into another. It references the source and target schemas and describes the transformation rules in the aligment.
Use the -project parameter to provide the location to your project file, as a relative or absolute path, or as a URI.

If you want to share your project, the best option is to save it as a project archive. In the save wizard you can specify to include online resources to make it loadable offline. You can also exlude any source data from the project, which will be ignored for command line transformation anyway.

Here are some examples on how you can provide paths to a project file on Windows:

-project C:\Hale-Project\myProject.halez (absolute path)
-project myProject.halez (relative path)
-project "C:\Hale-Project\my Project.halez" (quoted absolute path with spaces)
-project http://hale.igd.fraunhofer.de/templates/resources/files/example-inheritance-hydro/project.halex (URL)

Source data

-source <file-or-URI-to-source-data> [-providerId <ID-of-source-reader>] [<setting>...]

The source data you can as well provide as path to a file or a URI. For instance you can provide a URI to a Web Feature Service GetFeature request. Specifying a source data location is mandatory, any source data configured in the hale project will be ignored for the transformation.

If the source is a directory, you can specify multiple -include and -exclude parameters to control which files to load. If you do not specify -include , it defaults to "**" , i.e. all files being included, even if they are in sub-directories. Patterns use the glob pattern syntax as defined in Java and should be quoted to not be interpreted by the shell.

You can transform data from multiple sources if you provide a -source argument for each.

hale»studio will try to guess the file format and how to read it, so in most cases it will be enough to specify the location of the source data. But you also have the possibility to control in detail which hale data reader to use and how to configure it.

Please take a look at the InstanceReader reference to see what kind of providers are available for you and what kind of configuration options they offer.

Filtering source data

By default hale»studio uses all data passed in as sources for the transformation. The filter options allow you to filter the source data before it is passed to the transformation. This can be helpful for selecting only objects actually needed for the transformation (e.g. to reduce processing time and temporary storage used), or to exclude objects that would falsify the result.

With -filter you can specify a filter expression that is checked against all objects read from the source. The filter language can be specified at the beginning of the filter expression, followed by a colon. If no language is provided explicitly, the expression is assumed to be CQL. Following a simple example filter only accepting instances with the value 'Berlin' for the property name:

-filter "CQL:name = 'Berlin'"

To apply a filter only to objects of a certain type use -filterOn . The first argument to -filterOn is the type. You can specify it's name with or without namespace. If you want to specify the namespace, wrap it in curly braces and prepend it to the type name (for example: {namespace}name ). The second argument is the filter expression that is to be applied to that type.

If filters are defined, generally, any object needs to be accepted by at least one of the filters defined with -filter or -filterOn . If there are only filters for specific types ( -filterOn ), and no general filters defined, objects of other types pass without check.
Exception to that are only the exclusion filters. -excludeType and -exclude prevent an instance to be passed to the transformation even if they were accepted by a different filter.

-excludeType will prevent any instance of a specific type from being passed to the transformation. -exclude on the other hand allows specifying a filter. Only instances that don't match the filter pass on to the transformation.

The argument -overallFilterContext is another filter related option. If you pass this flag to the call, it is ensured that any context aware filters share a context across loading all of the defined sources. Context aware filters can right now only be supplied in Groovy.

Transformation result

-target <target-file-or-URI> [-preset <name-of-export-preset>] [-providerId <ID-of-target-writer>] [<setting>...]

Also you need to specify where to write the transformation result to. Usually this is a file.
In addition you need to provide either an export preset or a hale data writer ID and configuration.

The recommended approach is to use an export preset. You can easily define it in hale»studio with support through the UI, and save it as part of the project. An export preset essentially stores the configuration information on how to save the data. Create it in hale»studio via File→Export→Create custom data export... in the main menu. Configure the export and specify a name for the preset - this name is what you specify to use the preset on the command line.

For example, if you created a preset named GML you can use it for the transformation like this:

-target output.gml -preset GML

Even when using a preset, you can still provide setting parameters to override specific behavior.

Please take a look at the InstanceWriter reference to see what kind of providers are available for you and what kind of configuration options they offer.

Validation

-validate <ID-of-target-validator> [<setting>...]

The transformation result can optionally also be validated. To do so, specify a validator to use by its ID in hale. For example to validate a created XML/GML file against it's XML Schema Definition use:

-validate eu.esdihumboldt.hale.io.xml.validator

Please take a look at the Instance validators reference to see what kind of validators are available for you and what kind of configuration options they offer. The validator will by default be configured with the content type of the transformation result writer.

Other options

-reportsOut <reports-file>

Specifies the location of a report file to be written. The report file will hold information on the different tasks executed during the transformation process, e.g. the data import, the transformation, the data export or the validation. Use this option if you want to know in detail what kind of warnings or errors may have occurred during the execution of these tasks. If you find the file format hard to read, import the report log into the Report List view in hale»studio.

-stacktrace

Provide this flag to enable printing the stacktrace if launching the transformation process fails due to an internal error or misconfiguration. Please note that this does not encompass errors that occur as part of the transformation tasks as stated above - use the report file for information on those.

-trustGroovy

Provide this flag if you trust any Groovy transformation functions used in the provided project. This will lift restrictions on Groovy calls and speed up the execution.

-statisticsOut <statistics-file>

Specifies the location of a file to write the transformation statistics to. The transformation statistics are saved as JSON. They contain aggregated information on the different tasks that were run as part of the transformation.

-successEvaluation <file-or-URI-to-script>

This option allows you to customize the behavior of the transformation command in respect to if a run is interpreted as successful or not. The script you can specify here is a Groovy script that is evaluated against the transformation statistics. The script is expected to be encoded in UTF-8. You can use the -statisticsOut parameter to get an idea of the contents of the statistics.
A success evaluation script could for example look like this:

// the transformation must have been completed
assert aggregated['eu.esdihumboldt.hale.transform'].report.completed == true
// without errors
assert aggregated['eu.esdihumboldt.hale.transform'].report.errors == 0

// there must have been objects created as part of the transformation
assert aggregated['eu.esdihumboldt.hale.transform'].createdPerType.any { name, count -> count > 0 }

Examples

Here is an example on how to use the command line interface. There is a hale project named toInspire.halez which contains a source schema and a mapping one of the INSPIRE application schemas. The source data is contained in the geographicData.shp file, which is encoded in UTF-8. The transformed data should be stored in inspireData.gml and as GML with an INSPIRE SpatialDataSet as container.

Using a previously defined preset SpatialDataSet

> HALE -nosplash -application hale.transform -project toInspire.halez -source geographicData.shp -Scharset UTF-8 -target inspireData.gml -preset SpatialDataSet

Specifying writer and settings explicitly

> HALE -nosplash -application hale.transform -project toInspire.halez -source geographicData.shp -Scharset UTF-8 -target inspireData.gml -providerId eu.esdihumboldt.hale.io.inspiregml.writer -Sinspire.sds.namespace http://gdi-de.org/oid/de.beispiel.namespace -Sinspire.sds.localId 10

Full example with project and data

Based on a public example project we created an example you can directly try, complete with project data and script file to launch the transformation. You can download the example here. Please take a look at the README provided as part of the example, on how to set it up.