Groovy Join function

Joins multiple instances of different source types into one instance of the target type, based on one or more matching properties.

Category: Groovy

Parameter:

The join transformation can be used to map multiple source types to one target type. It allows you to combine information from instances/objects of different types. Using the Groovy Join function you can make use of a Groovy script to control the creation of a target instance for a given source instance and the instances it is linked to through the Join. This can be combined with using regular mapping functions to transform the properties.

Join order

The first step for creating a join transformation is to select the order of the source types. For each instance of the topmost type a target instance will be created. The following types will be joined in order to fill the result with additional information. This can also result in several instances of the second type for one particular instance of the first type. The third type then can have several instances matching one of those instances of the second type (or the first type again depending on the conditions) and so on.

Conditions

Each type after the first needs at least one join condition on the previous types. It can have conditions on only one of the previous types but it may also have conditions on several of them. Conditions are simple equalities. A property of the joined type is compared to the selected property of a base type. If all these conditions are equal the instance is used. A particular instance may match more than one instance of the previous types and will then be used each time.

To create a condition, select a property on the left and on the right side, then click on the button in the middle.

Access source properties

With a properties accessor you can navigate through the properties of an instance, as well as their sub-properties and their sub-properties and so on. Once you reached the property you are interested in, you have to decide how to retrieve it. There are several options:

Retrieve the first property value with .value()
Retrieve a list of all property values with .values()
Retrieve the first property instance or value with .first()
Retrieve a list of all property instances or values with .list()
Iterate over all property instances or values with .each{}

Whether you get instances or values with first() , list() or each{} depends on whether the corresponding property has sub-properties of its own. If it has sub-properties according to the schema, instances will be returned, otherwise directly values.

Let's take a look at the following example structure of an instance we assume is stored in the variable instance :

To retrieve the value of the id property we can use the property accessor like this:

instance.p.id.value()

You can store a value in a variable for later use:

def id = instance.p.id.value()

All names of the instance can be retrieved like this:

def names = instance.p.name.values()

In this case only the direct values associated to the name property are returned.
To access both name and language for all names we can use each :

instance.p.name.each {
  nameInstance ->
	
  // retrieve name
  def name = nameInstance.value
  // retrieve language
  def language = nameInstance.p.language.value()
}

As you see above, you can easily access the value of an instance with .value .

UI assistance

The script editing page offers the possibility to open a tray showing the source variables structure. You can use it to browse the properties and sub-properties. If you select an element, sample code for accessing the property is shown in the text field below, like in this example:

Troubleshooting

It may happen that there are multiple properties with the same name - to explicitly reference a specific property you can provide the namespace of the property like this:

instance.p.name('http://my.namespace.com').value()

Access joined/linked instances

With _source you get access to the primary source instance. This is an instance of the type that is the first in the Join order. Other instances may be linked to the primary source instance because they were matched to the source instance by the Join conditions you specified. Similar to a properties accessor you can also create an accessor that allows you to explore these links (e.g. _source.links ). It allows you to navigate the linked instances (and their respective linked instances) by type. Once you reached the instances you are interested in, you can access them and process them either as a whole or individually.

Retrieve the first instance with .first()
Retrieve a list of all instances with .list()
Iterate over all instances with .each{}

Following is a screenshot of an example script for a Groovy Join, on the right you see the source structure. Note that in addition to the primary source properties, at the beginning of the list there is also a type (AX_Flurstueck). This is a type that is linked to the primary source type through the Join.

We can retrieve property values from the primary source as usual, for instance as shown in the example in the source structure tray:

_source.p.anlass.values()

To retrieve property values from linked AX_Flurstueck instances (assuming they have a property id ), we can do something like this:

_source.links.AX_Flurstueck.p.id.values()

Note that this accesses all id property values from all linked AX_Flurstueck instances. To process only a single AX_Flurstueck instance at a time, you can use each :

_source.links.AX_Flurstueck.each {
  flurstueck ->
	
  // retrieve id
  def id = flurstueck.p.id.value()
}

Build the target instance

To create an instance as result of the script, you have to use the so-called builder API. You have to define a closure that describes how the instance is structured, which properties should have which values and so on, and add it using the _target variable.
The most simple of structures - an empty instance - can be created like this:

_target {

}

The builder by default creates the instance based on the structure defined in the schema. Thus using properties that do not exist in the type definition will fail. To get into more detail on how the builder API works, let's assume the following structure as the schema of our target instance to be created:

The structure is quite complex, but let's start with something simple: There is a type property which can have a string value - we can add a type property with the value test to our instance like this:

_target {
  type('test')
}

The type property may occur multiple times, we can easily add the property more than once:

_target {
  type 'test1'
  type 'test2'
  type 'test3'
}

This creates three type properties in the instance, each with a different value. As an alternative to before, here we use a notation without brackets.

The builder calls can be mixed with programming constructs, for instance could we achieve the same as above using a simple loop:

_target {
  for (i in 1..3) {
    type('test' + i)
  }
}

Creating complex structures

The type structure also contains a complex name property with several sub-properties on multiple levels. Such a nested structure can for example be created like this:

_target {
  name {
    GeographicalName {
      language 'en'
      spelling {
        SpellingOfName {
          text 'some name'
        }
      }
    }
  }
}

UI assistance

The script editing page offers the possibility to open a tray showing the target instance structure. You can use it to browse the properties and sub-properties. If you select an element, sample code for creating an instance with that property is shown in the text field below. Select all the properties you want to populate to generate a template for the instance creation. To use it just copy the sample code to the editor.

In addition there is support for content assistance when building an instance, it can be triggered with Ctrl+Space in the Groovy editor. It allows selecting applicable properties to build at the current position from a list.

Troubleshooting

In case there are multiple properties with the same name you have to reference a specific property explicitly by specifying its namespace. This is done through a named parameter namespace like in the example below:

_target {
  type('test', namespace: 'http://my.namespace.com')
}

Another problem that may arise is that property names may be conflicting with variables, reserved keywords or other identifiers. You can solve this by explicitly calling the builder, which is available as the variable _b in the script:

def type = 'test'
_target {
  _b.type type
}

If you need to use a property name that is a reserved keyword in Groovy, e.g. class, then you need to quote it. For example:

_target {
  'class'('test')
}

Multiple results

You can create multiple result instances by simply calling _target multiple times. For each call a result instance is created, and you can even integrate this with programming logic like loops. For example:

for (num in 1..3) {
  _target {
    id ( "Feature_$num" )
  }
}

Skip instance creation

If you want to skip creating a result instance for certain reasons, you can do so by simply not calling _target in that case.

Alternatively you can also throw a NoResultException, for example:

if (condition) {
  throw new NoResultException('reason')
}

_target {
  ...
}

Putting both together

Now that we know how accessing properties and building instances works, here a small example related to the above structures that makes use of both to create a target structure populated with values from a source instance:

_target {
  instance.p.name.each { name ->
    def lang = name.p.language.value()
    GeographicalName {
      if (lang) {
        language lang
      }
      spelling {
        SpellingOfName {
          text name.value
        }
      }
    }
  }
}

Helper Functions

hale studio provides the possibility to extend it with helper functions that can be conveniently called from Groovy scripts. An overview on the available functions can be found in the functions tray (see below). Select an individual function to get detailed information on:

What the function does and how it behaves (description)
How it should be called (description of each parameter and if applicable its default value)
What it returns

The functions are accessible through the _ binding in the script, and are organized in categories/packages.

Generally, if a function supports multiple parameters, you have to use the named parameters notation of Groovy. For Example:

_.geom.buffer(geometry: g, distance: 10)

Above the function buffer in the package geom is called with two parameters, the variable g as the geometry and 10 as the distance.

There is auto-completion available for helper functions as well. Auto-completion can be triggered with Ctrl + Space. Make sure to start with _. , you may have to specify a start character to have a valid script for the completion processor to work.

Additional binding content

Now we know already that the binding allows accessing _target, _b and, depending on which Groovy function you are using either _index, _source or the source properties. But there are further variables you can access.

_sourceTypes (not available in GroovyCreate) contain the source types in the case of a type function and in the case of a property function the source types of the type function in which this property function is executed in. It is a List of eu.esdihumboldt.hale.common.align.model.impl.TypeEntityDefinition.
_targetType contains the target type in the case of a type function and in the case of a property function the target type of the type function in which this property function is executed in. It is a eu.esdihumboldt.hale.common.align.model.impl.TypeEntityDefinition.
_cell contains the cell of this function. It is a eu.esdihumboldt.hale.common.align.model.Cell

_log enables the script to log infos/warnings/errors during execution. Each method accepts a string and (optionally) a throwable. Examples:

_log.info("Executing function!")
if (badCondition)
  _log.warn("Bad condition occured!")
try {
  executeSomething()
} catch (SomeException e) {
  _log.error("Exception occured!", e)
}

_project provides access to project information and variables. The following information is available:
- The project name: _project.name
- The project author: _project.author
- The project description: _project.description
- Project variables serve to customize a project's behavior. They can be accessed in a number of different ways:
  - _project.vars.NAME gets the value of the project variable with the name NAME and reports a warning if the variable does not exist.
  - _project.vars['NAME'] gets the value of the project variable with the name NAME and reports a warning if the variable does not exist.
  - _project.vars.get('NAME', 'default') gets the value of the project variable with the name NAME if it exists, otherwise yields the default value provided as second argument.
  - _project.vars.getOrFail('NAME') gets the value of the project variable with the name NAME if it exists, otherwise fails with an exception.
Using _snippets you can access Groovy snippets that were imported into the project. This allows you to keep extensive logic in external files, and allows to easily reuse them across different transformation scripts.

You can reference a specific Groovy snippet by its identifier that you set when importing the snippet. A list of all snippets and their identifiers is available in the Project view which also allows removing imported snippets.

A snippet has the same binding available as the transformation script you include it in. You can pass additional variables to the script that will be added to the binding.

For calling a snippet there are two recommended ways:
1. Run the snippet script or
2. run a closure on the snippet script
In both cases usually the return value of the snippet or closure will be used in the transformation script. Here are some examples calling a snippet with the identifier util:
```
// run the snippet
def res1 = _snippets.util()
// run the snippet passing binding variables
def res2 = _snippets.util(limit: 10, verbose: true)
// run a closure
// assuming the snippet defines the method "format"
def res3 = _snippets.util {
  format(source_field)
}
```
There are some restrictions associated with the use of snippets:
- Snippets are not supported when used in base alignments, as they rely on the snippets being imported into the project.
- If Groovy restrictions have been lifted, they are also not applied to external snippets, even if the snippet changed since the original import.
Tip: In case you are using the snippet script also in other contexts than hale, you can detect within the script if it is run in hale with the following check:
```
if (binding.hasVariable('runs_in_hale')) {
  // only do this when run in hale
  ...
}
```
withCellContext provides access to a map unique to each cell. For synchronization you should only access the map inside the closure like shown in this example:
```
withCellContext {
  def count = it.count
  if (count == null)
    count = 0
  it.count = ++count
  _log.info("count is " + it.count)
}
```
withFunctionContext provides access to a map unique to each function (all cells of this function). For synchronization you should only access the map inside the closure like shown in this example:
```
withFunctionContext {
  def count = it.count
  if (count == null)
    count = 0
  it.count = ++count
  _log.info("count is " + it.count)
}
```
withTransformationContext provides access to a map unique to the whole transformation. For synchronization you should only access the map inside the closure like shown in this example:
```
withTransformationContext {
  def count = it.count
  if (count == null)
    count = 0
  it.count = ++count
  _log.info("count is " + it.count)
}
```

Note: When using one of the transformation contexts that allow you to share data between script executions in different places, keep in mind that usually no order in which instances are transformed can be guaranteed. The only way to influence transformation order is setting cell priorities on type relations.

Collectors

A collector is a helper object that allows you to easily collect information.

To create a new collector instantiate one like this:

def c = new Collector()

A collector often is useful for collecting (shared) information in a transformation run. Thus a helper method is provided as part of the helper functions, that retrieves or creates a collector associated to a context map. For example:

withTransformationContext {
  def c = _.context.collector(it)
}

Store information

In a collector, information is stored based on keys. Most often a key is a string, but you can also use other objects as keys.

The following statement adds a value to the key named identifiers :

c.identifiers << 'ID1'

Keys can be used with an arbitrary number of levels:

c.hydro.rivers.identifiers << 'ID1'

Non-string keys (for example numbers or lists) or variables can be used as keys by using the squared brackets notation:

def key = ['foo', 12]
c[key] << 'bar'
c.hydro.rivers.source[12] << 'ID1'

When you know that you deal with a single value instead of accumulating values, you can use the assign operator:

def key = 'identifier'
c[key] = 'ID1'

There is no need to create keys, the corresponding child collectors are created automatically when a key is accessed.

Retrieve information

To retrieve information from a collector, access is also done using the respective keys. By just specifying the keys you get the respective child collector. To retrieve values from a collector you can call the following methods:

c.value() retrieves the first value associated to the collector, or null
c.values() provides you with the list of objects associated to the collector, in the order they were added

Both of the above mentioned methods ignore any child collectors and only return the values of the addressed level.

Additionally a collector provides methods to iterate over its values and child collectors. To iterate over a collector's values use each or consume with one argument:

c.identifiers.each { value ->
  ...
}

The difference between each and consume is that when using consume , the corresponding list of values is reset.

When using variable keys it may be desired to be able to iterate over all keys (or child collectors) in a collector. For this the eachCollector method can be used. If only one argument is provided, the child collector is passed in, if two arguments are provided the key and the respective child collector are passed in:

c.eachCollector { key, child ->
  ...
}

If you are interested in the keys and corresponding child values of a collector, you can use each or consume with two arguments to iterate over all present keys and the respective value lists. For example:

c.each { key, values ->
  values.each { value ->
    ...
  }
}

Groovy Join

Join order

Conditions

Access source properties

UI assistance

Troubleshooting

Access joined/linked instances

Build the target instance

Creating complex structures

UI assistance

Troubleshooting

Multiple results

Skip instance creation

Putting both together

Helper Functions

Additional binding content

Collectors

Store information

Retrieve information

Representation of Parameter (Advanced Documentation)