<attribute>
<ref>icfs:name</ref>
<displayName>Group name</displayName>
<correlation/> (1)
<outbound>
<source>
<path>name</path>
</source>
</outbound>
<inbound>
<strength>weak</strength>
<target>
<path>name</path>
</target>
</inbound>
</attribute>
Smart Correlation Configuration
Correlators
The basic element of the configuration is the correlation rule (or correlator) configuration.
The correlation rule is more user-oriented term, while correlator is more implementation-oriented one. See also the overview document. Risking a bit of confusion, let us stick with the term correlator in this document, at least for now. |
Correlators are hierarchical, with a specified default algorithm for combining their results.
Correlators are usually defined in the object template. However, there may be other places; please see the TODO Attribute-Level Configuration and Item-Level Configuration sections at the end of the document.
Correlation Algorithm Outline
Individual correlators are evaluated in a defined order. Each one produces zero, one, or more candidate matches. Each candidate match has a confidence value assigned: either "fully certain" flag, meaning that the correlator is certain that this is the identity searched for, or a decimal number denoting the level of confidence. The scale for this number is specific for each correlator.
Combining the Results
The default algorithm for combining the results is the following:
-
Each correlator provides a set of candidate matches, each with a confidence value.
-
A union of these sets is constructed, and for each candidate, the confidence values coming from individual correlators are summed. The "fully certain" flag is treated as "positive infinity" value here. Results of correlators denoted by
ignoreIfMatchedBy
are ignored, as described below.
"Ignore if Matched by" Flag
When two rules (let us mark them 1 and 2) are concerned, it may happen that a match by rule 1
automatically implies the match by rule 2. As an example, consider the situation when rule 1
matches by both givenName
and familyName
(at the same time) and rule 2 matches
by familyName
only, without regarding any other items.
Let us now assume that rule 1 brings a confidence value increment of 0.5, while rule 2 brings an increment of 0.3. This setup means, though, that in practice any candidate matching rule 1 would match rule 2 as well, providing an increase of 0.5 + 0.3 = 0.8. While not necessarily incorrect, such treatment is counter-intuitive.
Therefore, we have introduced a mechanism to avoid such duplicate confidence incrementing
by marking rule 2 as being ignored for those candidates that are matched by rule 1
as well.[1] This is done by setting ignoreIfMatchedBy
for rule 2
to rule 1
.
Using the Resulting Confidence Values
In midPoint 4.6, the resulting aggregated confidence values for individual candidates are compared with two threshold values:
-
Automatic match threshold (
AM
): if a confidence value is equal or greater than this one, the candidate is considered to automatically match the identity data. (If, for some reason, multiple candidates do this, then the situation is reported as a potential problem, and human decision is requested.) -
No-match threshold (
NM
): if a confidence value is below this one, the candidate is not considered to be matching at all - not even for human decision.
Said in other words:
-
If there is a single candidate with confidence value ≥
AM
then it is automatically matched. -
Otherwise, all candidates with confidence value ≥
NM
are taken for human resolution. (If there are multiple candidates with confidence value ≥AM
among them, then the situation is reported as suspicious.) -
If there are none, "no match" situation is assumed.
Correlator Types and Common Options
A correlator can be of multiple types:
Type | Meaning |
---|---|
|
Smart item-based correlator. The suggested one. |
|
Legacy (pre-4.5) filter-based correlator. |
|
Experimental correlator, based on an evaluation of a custom expression. Since 4.5. |
|
Correlator that uses an external ID Match API. Since 4.5. |
|
A correlator that composes other correlators. |
Each correlator has the following optional parameters:
Parameter | Meaning |
---|---|
|
Order in which this correlator is to be evaluated. (Related to other correlators with the same authority level.) |
|
How is the result of the correlator interpreted. (See the table below.) |
|
Defines how this correlator - if matching - increases the overall confidence of its results. It may be a constant value, or an expression - for example, deriving the confidence value from Levenshtein edit distance of selected item values. |
|
If any of these match the candidate that this particular correlator matches as well, the confidence increase stemming from being matched by this correlator is not applied. |
The following setting is a questionable one. It was created in midPoint 4.5 as an experimental feature, so it is kept here just to see if it will be of any use.
Value | Meaning |
---|---|
|
If the correlator finds a single owner with certainty, the answer is considered final, without asking other correlators. Otherwise, results from these correlators are combined in the standard way. |
|
If all the authoritative correlators provide (the same) single owner, or no owner (with certainty), then this single owner is considered as final, without asking other, non-authoritative correlators. Otherwise, results from these correlators are combined in the standard way. |
|
Results of these correlators are combined in the standard way. |
The default authority was authoritative
for 4.5. This is to be reconsidered in 4.6 and beyond.
(Maybe the default should be nonAuthoritative
if the confidence is specified as something below
sure
?)
Item-Based Correlator Configuration
The items
correlator has the following configuration:
Option | Meaning |
---|---|
|
A definition of (or a reference to) a correlation item that has to match when checked by this correlator. |
Option | Meaning | Example |
---|---|---|
|
Where (in the focus object) is this correlation item stored. |
|
|
Matcher to be used for matching this item. |
|
Matchers
A matcher (or item value matcher) is the basic building block of item-based correlation. It compares two item values to see if they "match" sufficiently so that the item is considered to be "matching" for the purpose of particular correlation rule.
Any matcher consists of the following two parts:
-
Normalizer: It creates a normalized form of the stored value that can be targeted by the database query.
-
Comparator: It compares two normalized values to determine if they match.
In the future, we anticipate the use of matchers outside the correlation. For example, they may be used for regular user search within the GUI. |
This concept is somewhat similar to the one of "matching rule". However, the implementation is much different. |
Normalizers
There are the following types of normalizers:
Normalizer type | Description |
---|---|
|
Default or specified PolyString normalizer. May be configured. (In the future.) |
|
The original value. |
|
Retains only specified number of characters from the start. |
|
Evaluated custom expression to get the normalized form. |
Is "normalizer" a good name? What about "indexer"? |
Comparators
They describe how should the system compare values, which have been already normalized. A comparator is just another representation of the prism filter. (Sometimes combined with the matching rule.)
Comparator type | Parameters | Description |
---|---|---|
|
none |
It compares values in the exact way. |
|
none |
It compares string values in a case-insensitive way. |
|
|
Matches if the values have Levenshtein edit distance at most |
|
|
Matches if the values have trigram similarity at most |
The PolyString normalization does not require PolyString -typed values.
It is applicable to any String or PolyString values.
|
In the future: Levenshtein edit distances of individual items should be usable in the confidence expressions.
They should be referencable using variable names like distanceX where X is the correlation item name;
and additionally distance if there is a single correlation item configured to be compared using this metric.
|
Item Definition vs Item Reference
A correlation item can be specific to a single correlator, or can be shared among multiple
correlators. In the latter case it can be defined at an upper level, that is, in an
embedding correlator, or a correlator referenced by the extending
parameter.
TODO describe this in more detail
Attribute-Level Configuration
To make correlation configuration more user-friendly, it is possible to specify correlation also at the level of attributes.
1 | Specifies the correlation |
The correlation
item can have the following properties:
Property | Meaning | Default |
---|---|---|
|
An authority of the correlator created for this attribute. |
Depending on confidence? |
|
A confidence of the correlator created for this attribute. |
|
|
A focus item this attribute should be correlated to. |
Derived from the inbound mapping, if possible. |
|
How this item should be matched. |
"PolyString norm" |
|
Correlators (rules) this attribute should be added to. |
None |
If present, this configuration item will turn on "before correlation" evaluation of inbound mappings for this attribute.
Perhaps we should do the same for explicit (standalone) definition of a correlation item. But we would need to scan for all inbound mappings that refer to that item. |
Item-Level Configuration (?)
Maybe we could allow specifying the correlation right on the focus item, e.g. in the object template. This would be common to all resources referring to the particular focus or focus archetype.
Maybe we will have to do this, just to ensure the "focus" variant will be updated when changes unrelated to a synchronization are applied to the user object.