Multiple Identity Data Sources

Last modified 14 Sep 2022 13:27 +02:00
EXPERIMENTAL
This feature is experimental. It means that it is not intended for production use. The feature is not finished. It is not stable. The implementation may contain bugs, the configuration may change at any moment without any warning and it may not work at all. Use at your own risk. This feature is not covered by midPoint support. In case that you are interested in supporting development of this feature, please consider purchasing midPoint Platform subscription.
Since 4.6
This functionality is available since version 4.6.

The advanced correlation needs often go hand in hand with the situations when there are multiple sources of the identity data.

An Example

A university may have the following systems:

Table 1. Source systems in a sample university installation
System Description

Student Information System (SIS)

Provides data on students and faculty.

Human Resources (HR)

Keeps records of all staff - faculty and others.

External persons (EXT)

Maintains data about visitors and other persons related to the university in a way other than being a student or employee.

While the data about a given person are usually consistent, there may be situations when they differ. For example, the given name may be recorded differently in SIS and HR systems. Or the title may be forgotten to be updated in HR. An old record in the "external persons" system may be out-of-date altogether.

Requirements

Potential data inconsistencies lead to two kinds of requirements:

  1. When processing data from these systems, midPoint has to somehow decide which ones are "authoritative", that is, which ones to propagate to the "official" user data stored in the repository and provisioned to target systems afterwards.

  2. When correlating, we may want to match data from all systems for the candidate owners. (Not only the "official" user data.)

Starting with 4.6, midPoint supports both of them.

Selecting the Authoritative Data

Before 4.6, the first requirement was resolvable only using a custom code. A typical solution was that inbound mappings put the resource-specific values (like given name, family name, and so on) into temporary properties, present often in the user extension. This was followed by an object template mapping that selected the right values and putting them into givenName, familyName and other properties of the user.

Actually, the current solution in 4.6 is an adaptation of this approach. This time, however, it is implemented right inside midPoint.

Configuration

When selecting the most appropriate data, there is no magic available. It is the engineer who must provide an algorithm for this. MidPoint provides two levels at which this can be done:

  1. the whole object (e.g. user),

  2. an individual item (e.g. givenName).

Working at the level of the whole object is probably easier, and maybe sufficient for the majority of installations.

Declaring Multi-Source Data

Usually, not all object properties are to be handled in this "multi-source" style. The selection of the ones that should be treated so is done in the object template.

An example:

Listing 1. Setting up four multi-source properties
<objectTemplate>
    ...
    <item>
        <ref>givenName</ref>
        <multiSource/>
    </item>
    <item>
        <ref>familyName</ref>
        <multiSource/>
    </item>
    <item>
        <ref>honorificPrefix</ref>
        <multiSource/>
    </item>
    <item>
        <ref>honorificSuffix</ref>
        <multiSource/>
    </item>
    ...
</objectTemplate>

This declaration "redirects" inbound mappings for the properties mentioned. It means that instead of the traditional flow of data show in figure 1, they will flow like indicated in figure 2.

Mapping redirection 1
Figure 1. Traditional inbound mappings
Mapping redirection 2
Figure 2. Redirected inbound mappings for multi-source items

And here comes the question: How should the data (and which ones) flow from the identities container to their destination places?

Although we said it is often easier to decide this question at once for the whole object, let us explain that the way around: from individual items up.

Selecting Authoritative Values for an Individual Item

It is no surprise that midPoint selects the authoritative data for the individual item (like givenName) using ordinary object template mapping. Although the mapping could be written just like any other, there is a nicer, shortened form available.

An example:

Listing 2. Selecting authoritative value for the given name
<objectTemplate>
    ...
    <item>
        <ref>givenName</ref>
        <multiSource>
            <selection>
                <expression>
                    <script>
                        <code>
                            import com.evolveum.midpoint.schema.util.FocusIdentitySourceTypeUtil
                            import com.evolveum.midpoint.xml.ns._public.common.common_3.UserType

                            midpoint.selectIdentityItemValues(
                                    identity,
                                    FocusIdentitySourceTypeUtil.defaultAccount('157796ed-d4f2-429d-84f3-00ce4164263b'),
                                    UserType.F_GIVEN_NAME)
                        </code>
                    </script>
                </expression>
            </selection>
        </multiSource>
    </item>
    ...
</objectTemplate>

The content of the selection element is a mapping.

It has two default sources, and other ones can be added by the engineer.

Table 2. Default sources for the item-selection mapping
Source Type Description

identity

a collection of FocusIdentityType

The content of identities/identity multivalued container. It contains all "incoming" identity data from the inbound mappings. It is the primary source of information from which we have to select the one we need.

defaultAuthoritativeSource

FocusIdentitySourceType

The source for identity data that was determined to be the default one for the user. (See the next section.)

The mapping is executed just like any other object template mapping. The value(s) it produces are put right into the target property - in this case it is givenName.

In our example above we used midpoint.selectIdentityItemValues method. It is a utility method to aid in selecting values from the multiple sources. It has three parameters:

Table 3. Parameters of midpoint.selectIdentityItemValues method
Parameter Type Description

identity

Collection<FocusIdentityType>

A collection of identities where we search for the data. In particular, we look for the values with sub-path of itemPath (see the third parameter) in the identity[x]/data container.

source

FocusIdentitySourceType

Specification of the source we are looking for. The source is currently matched using resource OID, kind, intent, and tag. The null value of source means "take values from all sources".

itemPath

ItemPath

Item that should be provided.

If no item selection mapping is provided (which is the usual case), the following one is used automatically by midPoint. This is why we do not need to specify these mappings explicitly.

Listing 3. Automatically-applied item selection mapping
<objectTemplate>
    ...
    <item>
        <ref>(somePath)</ref>
        ...
        <multiSource>
            <selection>
                <expression>
                    <script>
                        <code>midpoint.selectIdentityItemValues(identity, defaultAuthoritativeSource, (somePath))</code>
                    </script>
                </expression>
            </selection>
        </multiSource>
    </item>
    ...
</objectTemplate>

This leads us to the question of how the defaultAuthoritativeSource value is computed.

Selecting Authoritative Values for the Whole Object

As probably expected, there is again a template mapping for this. In a way similar to the previous mappings, it could be specified as a regular mapping, but it is better to use the special configuration option for it.

An example:

Listing 4. Selecting the default authoritative data source
<objectTemplate>
    ...
    <multiSource>
        <defaultAuthoritativeSource>
            <expression>
                <script>
                    <code>
                        import com.evolveum.midpoint.util.MiscUtil

                        def RESOURCE_SIS_OID = '...'
                        def RESOURCE_HR_OID = '...'
                        def RESOURCE_EXT_OID = '...'

                        // The order of authoritativeness is: SIS, HR, external

                        if (identity == null) {
                            return null
                        }

                        def sources = identity
                                .collect { it.source }
                                .findAll { it != null }

                        def sis = sources.find { it.resourceRef?.oid == RESOURCE_SIS_OID }
                        def hr = sources.find { it.resourceRef?.oid == RESOURCE_HR_OID }
                        def external = sources.find { it.resourceRef?.oid == RESOURCE_EXT_OID }

                        MiscUtil.getFirstNonNull(sis, hr, external)
                    </code>
                </script>
            </expression>
        </defaultAuthoritativeSource>
    </multiSource>
</objectTemplate>

Here we have three source resources, SIS, HR, and EXT. We would like to implement a rule of "data from SIS are more authoritative than data from HR, and these are more authoritative than data from EXT". (In the real world the rules may be more complex.)

Although we could use FocusIdentitySourceTypeUtil.defaultAccount and resourceObject methods to construct the sources explicitly, it is also possible to re-use source values already present in the identity collection. That’s exactly what is done in the example above.

The mapping has only single default source:

Table 4. Default source for the source-selection mapping
Source Type Description

identity

a collection of FocusIdentityType

The content of identities/identity multivalued container. It contains all "incoming" identity data from the inbound mappings.

It returns the FocusIdentitySourceType object that may be used for selecting the values of individual items, as we have seen in the previous section.

If such a mapping does not exist or if it returns no value, the default behavior is to select all values from all sources (for a given item); as we have seen as well. [1]

Limitations

  1. This feature is supported only for standard focus properties present at the root level (like givenName, familyName) and all extension properties at root level only. (I.e., not in containers in the extension).

  2. It must be used along with archetypes, i.e., the object template must be declared in an archetype.

Future Work

This mechanism is intentionally consistent with the work that was done during the Phase 1 of the midPrivacy project: Data provenance prototype. The implementation is different, because the value metadata implementation is not in production state yet. But we eventually may unify the two.


1. Note that this is also the behavior of regular inbound mappings. There is a small difference, though. In regular mappings, we do not always take inbounds from all projections - only from currently loaded ones plus the ones that are loaded on demand. In these value-selection mappings, we always consider all projections that had their inbounds evaluated: now or in the past. Outputs of inbound mappings for those projections that are currently not loaded are stored in the identities/identity containers.
Was this page helpful?
YES NO
Thanks for your feedback