Synchronization

Last modified 03 Oct 2024 15:25 +02:00
It is a capital mistake to theorize before one has data.
— Sherlock Holmes
The Adventures of Sherlock Holmes by Arthur Conan Doyle

Data are the lifeblood of any software system. Ensuring proper management of the data is one of the primary responsibilities of all IT professionals. However, data management can be very tricky. One of the important principle of software architecture is often formulated as "do not repeat yourself". This applies to code, as it applies to data: though shall not repeat the data. There is one original, authoritative value. And there should not be any copies of that value. Ever. There is just one universal source of truth. If there are no copies, then the data are always consistent. No copies mean no contradictions. Just one truth, precise and crystal-clear. Keep data in one place, and one place only.

That is the theory.

However, practice has a different opinion to offer. There are many incompatible technologies in practical IT systems. Applications built on relational databases cannot directly use data from directory services. Even relational databases do not fit together easily. Each application is designed with a different data model in mind. Each cloud application has a different interface (API). There are data translation and bridging technologies that work as adapters to resolve compatibility issues. You make a query, the query is intercepted by an adaptor, the adaptor translates the query, executes it in a remote database, gets the results, translates them, and provides them to you. All of that in real time. Those are elegant solutions. Yet, there is a cost to pay. The adapters add latencies, and they almost always have a negative impact on performance. Even worse, transaction handling and data consistency is very problematic. Such adapters are additional components on a critical path, and their failures are very painful. The resulting system is often operationally fragile: failure of even a minor component means a failure of the entire system - not to speak about the enormous complexity and cost of the solution.

On the other hand, copying all the data into my application database is so very convenient. The application can access the data easily, using just one homogeneous mechanism. Failure of other components are not affecting the critical path. It is all so much better for performance. Copying the data solves almost all the troublesome issues. Except for one small detail: the problem of keeping the data up to date. That is where the synchronization mechanisms come in.

However hard you may try, it is almost impossible to avoid making copies of the data. Identity data are no exception. In fact, identity data are often the most heavily affected. That makes a lot of sense. Applications are built for users to use them. Therefore, almost every application keeps some kind of data about users. In addition to that, such data are usually very sensitive from security and privacy point of view.

If we cannot avoid copying the data, the best thing that we can do is to keep the copies managed and synchronized.

Some applications have built-in support for LDAP or directory synchronization. However, those mechanisms are usually quite weak and fragile. For example, many applications provide capability for on-demand synchronization with directory service on login time. It usually works like this:

  1. User enters username and password to application login dialog.

  2. The application connects to the directory service to validate the password.

  3. If the password is correct then the application retrieves user data from the directory.

  4. The application stores copy of user data locally.

  5. Business as usual. Local copy of the data is used ever since.

New cloud applications provide similar support for single sign-on (usually OpenID Connect or SAML), which is basically the same:

  1. No local session exists, therefore user is redirected to identity provider.

  2. User enters username and password to login dialog at identity provider site.

  3. If the password is correct, then identity provider redirect user back to application, packing user data in the response.

  4. The application stores copy of user data locally.

  5. Business as usual. Local copy of the data is used ever since.

This approach works quite well at the beginning. The data can be updated every time the user logs in - which may or may not be enough. Yet, after a while, the data begin to stink. Users are renamed, but the local copies are not updated. Users are deleted, but the local copies stay around forever. There are local accounts and privileges that are not reflected back to the directory service or identity provider, and therefore remain undetected for years. Which means that we have a serious security and data protection problem here. Even worse, we do not even know that the problem is there.

Some applications have more advanced synchronization processes that can do better than this. However, an application that does synchronization well is still an extremely rare sight. There is a good reason for this. Synchronization is much harder than it seems. There may be data inconsistencies on both sides. There may be network communication errors and configuration errors. Data models are evolving over time. Policies are changing. It is no easy task to reliably synchronize the data in such environment. Therefore, there is a special breed of systems that specialize in synchronization of identity data: identity management systems.

Synchronization in MidPoint

Synchronization is one of the fundamental mechanisms of midPoint. Synchronization mechanisms are integral part of midPoint design from its very beginning. Many of the things that midPoint normally does are in fact just different flavors of synchronization. There are obvious cases such as reconciliation process, synchronizing account attributes with data in midPoint repository. However, there are also less obvious cases, such as ordinary provisioning operation when midPoint needs to create a new account for a user. Even that case is in fact a synchronization: midPoint user properties are synchronized with a new empty account on the resource. Majority of midPoint operations are directly or indirectly using the synchronization principles.

Reuse
Reuse of the mechanisms is one of fundamental principles of midPoint design. When we have designed midPoint, we have not invented a separate mechanism for every midPoint feature. We have rather designed few very generic principles that are re-used at many places in midPoint. Synchronization is one of these principles. There is one code that implements the core of the synchronization logic. That code is used whenever we need to "align" objects that relate to each other. The same code is used for user-account reconciliation, ordinary provisioning, role-based provisioning, live synchronization, opportunistic data consistency …​ almost everywhere.

MidPoint synchronization provides a continuous functionality spectrum that can be tweaked and tuned to match specific needs. Yet, the synchronization mechanisms can be divided to several broad and slightly overlapping categories:

  • Live synchronization is almost real-time synchronization mechanism. MidPoint continually scans the resource for changes. When changes are detected they are immediately processed by midPoint. The actual latencies depend on the capabilities of the resource, but usual numbers range from few seconds to few minutes. Only recent changes are processed by live synchronization. It is a very efficient mechanism, which usually has fast responses even in large-scale deployments. It usually runs all the time.

  • Reconciliation is a process that compares the data and corrects the differences. When an account is reconciled, midPoint computes the attribute values that the account should have. The computed values are compared to the real values that the account has. Any differences are corrected. Reconciliation is quite heavy-weight mechanism, comparing all the accounts one-by-one. It is also a very reliable mechanism. It can correct mistakes that were missed by live synchronization, it can correct data after major failures, corruptions, and so on. Reconciliation is usually executed in regular intervals. However, due to its heavyweight nature, it is usually executed during off-peak times (nights and weekends).

  • Import is usually a one-time process to get data from the resource to midPoint. Import is used to populate midPoint with initial data, or it may be used to connect a new resource to midPoint. Import is almost the same as reconciliation, with only a few minor differences. However, the purpose of import and reconciliations are different, and therefore there may be a slightly different configuration of import policies (mappings). Import is usually not scheduled, it is manually triggered when needed.

  • Opportunistic synchronization is a very special kind of animal which is quite unique to midPoint. Opportunistic synchronization is triggered automatically when midPoint discovers that something is not in order. For example, if midPoint tries to modify an account, but it discovers that the account is not there. Synchronization mechanism is triggered at that point, just for that single account. This usually means that the account is re-created. The opportunistic synchronization is also triggered when midPoint tries to create a new account, but the account is already there. This approach makes midPoint a self-healing system. If midPoint runs into a problem, it can often correct the problem just by itself.

Individual mechanisms differ in a way data inconsistency is discovered: live synchronization actively looks for new changes, reconciliation compares the data one-by-one and opportunistic synchronization discovers inconsistency by chance. Yet, all the mechanisms react to inconsistency in the same way. There is only one policy that specifies how to fix the data. Of course, there may be slight deviations in the behavior. For example, we usually want import to behave in slightly different way than reconciliation. MidPoint allows that. Yet, there is just one common synchronization mechanism. This has a very good reason. It does not really matter how the problem was discovered. What really matters is that the problem gets fixed. We do not want to maintain four separate configurations for every delicate variation of the functionality. Having one policy is much better. MidPoint knows which part of the configuration need to be applied in each specific situation, and it does it automatically. This unifying approach significantly simplifies the configuration of midPoint synchronization mechanisms. That is also a reason why the boundaries of individual synchronization mechanisms are quite fuzzy. In fact, it is just one big mechanism with several facets.

Source Systems, Target Systems And Other Creatures

The tale of idealistic identity management deployment starts with a human resources (HR) system. The HR system is supposed to have records for all the identities, therefore it is authoritative source system. Identity management system pulls in all the data from HR database, recomputes them and creates accounts on target systems. And they lived happily ever after.

Now, let’s get back to reality. The HR database is indeed an authoritative source of data in many real-world cases. However, it is a limited source. It contains data about employees only, and it has only partial information about them. For example, there is no username in the HR record. Username has to be generated by IDM logic. There is no initial password. Organizational structure assignment is often incomplete or unreliable. Therefore, HR database is only a partially-authoritative source. There may be additional authoritative sources for contractors, partners, suppliers, support engineers and other identities that need to access our systems. These are additional source systems. Then there is a directory system, which is often Microsoft Active Directory. It should be a target resource, in theory. Yet, there may be pieces of authoritative information in here. For example, an algorithm to generate a username may be based on the usernames that are already used in the Active Directory. The data in Active Directory may also be needed to create a unique e-mail address. Directory systems are also used as a semi-authoritative sources for telephone numbers, office numbers and similar identity attributes. Therefore, such systems are both target and source systems. Then there are "pure" target systems. These are not supposed to be authoritative in any way. Identity management system will only write to such systems. Or …​ will it? What happens when a conflicting account already exists on such system, and therefore we cannot create a new account for a new employee? How do we check if there are no accounts that are not supposed to be there? It turns out that even the "pure" target systems contain valuable sources of information after all.

The reality brings a wild mix of source, target, semi-source, target/source and quasi-target systems that are almost impossible to put into a pre-defined boxes. Therefore, midPoint does not bother to define a concept of "source" or "target" resource. Any resource can be both source and target, and the authoritativeness of each attribute can be controlled on a very fine level. Almost every real-world situation can easily fit into this model.

We are still using the terms source system and target system. However, these terms refer to the business purpose of such systems in the identity management architecture. The terms describe how we think about the systems, what is the usual direction of data flow. However, midPoint will read data from target system when needed, and it will be happy to write data to source systems if necessary.

Inbound and Outbound Mappings

MidPoint is firmly based on the principle of reuse. Previous chapter explained that behavior of attributes during provisioning is controlled by mappings. Therefore, it is perhaps no big surprise that the behavior of attributes during synchronization is also controlled by mappings. In fact, provisioning is just a special case of synchronization. Following picture explains the combined mechanism.

Synchronization

There are two types of mappings:

  • Inbound mappings map data flowing into midPoint. These mappings take the data from the source resources, transform them and apply the result to the user object.

  • Outbound mappings map data flowing out of midPoint. These mappings take user properties, transform them and apply the result to account attributes in target systems.

The mappings themselves are almost the same regardless whether they are inbound or outbound. They have sources, targets, expressions, conditions, etc. Just the sources and targets are reversed:

Inbound mapping Outbound mapping

Direction

resource → midPoint

midPoint → resource

Mapping source

resource object (e.g. account)

focal object (e.g. user)

Mapping target

focal object (e.g. user)

resource object (e.g. account)

That is it. Think about the mappings that were used in previous chapter, just flip the direction. Now the mapping will take data from the account and the results will be applied to user object. Like this:

<attribute>
    <ref>lastname</ref>
    <inbound>
        <target>
            <path>$focus/familyName</path>
        </target>
    </inbound>
</attribute>

This mapping takes the value of lastname attribute from the resource and stores the value in familyName property of midPoint user.

The rest is the same as outbound mappings. All the expressions and evaluators can be used for inbound mappings in the same way as for outbound mappings. For example, a Groovy expression can be used to sanitize the value before it is stored in midPoint:

<attribute>
    <ref>lastname</ref>
    <inbound>
        <expression>
            <script>
                <code>lastname?.trim()</code>
            </script>
        </expression>
        <target>
            <path>$focus/familyName</path>
        </target>
    </inbound>
</attribute>

The same approach can also be taken for activation, and even for password mappings. However, there is one difference for password mappings. Password is usually write-only value. When the password is written, it is usually hashed, and the original value cannot be retrieved any longer. Then there are resources such as HR systems that do not store employee passwords at all, because those are not really accounts that we are reading. Those are just regular database entries that the connector presents as accounts. Inbound password synchronization is almost never easy, and it often requires a lot of planning and ingenuity. However, there is one method that is used quite often. The initial user passwords are usually randomly generated. As this is a very common case, midPoint can do this easily:

<credentials>
    <password>
        <inbound>
            <strength>weak</strength>
            <expression>
                <generate/>
            </expression>
        </inbound>
    </password>
</credentials>

This mapping generates random password for a user. Both the mapping and generate expression evaluators are quite smart. The mapping knows that the target is user password, without any need to explicitly specify that. In addition to that, generate expression evaluator will take password policy into consideration. It does not make sense to generate any random password. If we do not consider password policy, then we can generate password that is too short, too long, too weak or too strong to be useful in any way. Therefore, generate expression looks for password policy, and generates a random password that just matches requirements for password length and complexity.

There are more important details to see here. The inbound password mapping is weak. There is good reason for this. We do not want existing midPoint password to be replaced by randomly generated password. We only want to set a random password in case that it is an initial password, the first password ever. That is exactly what a weak mapping does: it sets new value only if the target does not have any existing value. Therefore, this mapping will not overwrite passwords that are already set.

There is no direct account-account synchronization in midPoint. As explained before, midPoint follows a star topology (a.k.a. "hub and spoke"). Therefore, the synchronization is either from account to user (inbound) or from user to account (outbound). The effect of account-account synchronization is achieved by combining inbound and outbound synchronization mechanisms.

Correlation

It is quite easy to import all HR records into an empty midPoint: we have to set up inbound mappings, start import task, wait a bit, and all is done. However, practical situations are much more complex. Synchronization usually does not run on a green field. Live synchronization and reconciliation are supposed to work with pre-existing midPoint users. Import may not be entirely trivial either, for example we may import data from an additional data source into a running midPoint deployment. Some users in the import data set are new, but there may be accounts describing existing users. We need to tell the difference between brand-new account and an account that belongs to an existing user. We need to handle them in a different way. Of course, midPoint has a solution for this: correlation mechanism.

Correlation is a method to connect newly-discovered accounts and existing users. Whenever midPoint discovers new account it will try to link that account to an existing user. MidPoint looks for an appropriate user to represent newly-discovered account. It does so by matching selected account attributes to appropriate user properties. Two things are needed for this to work: correlation rule and corresponding inbound mapping.

Correlation rule, better known as correlator, specifies which properties are used for correlation, and how they are used. In its simplest form, the correlator points to a single user property:

<correlation>
    <correlators>
        <items>
            <item>
                <ref>personalNumber</ref>
            </item>
        </items>
    </correlators>
</correlation>

Correlator, as shown above, specifies that the personalNumber property should be used to correlate accounts and users. If a newly-discovered account has a value that matches the personalNumber of a user, it is assumed that such user is an owner of the account. Such account and user are correlated, and they will be linked.

Now we know that personalNumber property of midPoint users should be used for correlation. However, which account attribute should we use to match values of personalNumber property? That is where the mapping comes in. MidPoint looks for inbound mapping that maps a value of an attribute to personalNumber property, such as this one:

<attribute>
    <ref>empno</ref>
    <inbound>
        <target>
            <path>$focus/personalNumber</path>
        </target>
    </inbound>
</attribute>

MidPoint knows that it needs to take value of account attribute empno to produce value for user property personalNumber. This is what midPoint normally does when it synchronizes account data to user data. The same mapping is used for correlation. MidPoint takes the value of empno attribute, passes it through the mapping, and compares the result with the personalNumber property of all users. The user that has a matching value is correlated to the account.

The correlator and the mapping always work together. Mapping transforms the value, correlator matches the value.

We have seen only very simple correlator so far. However, correlator can be quite complex. Correlation may be set up to match several items, correlation may be based on custom search filter or expression, and several rules may be used in composition, using specified weights to tune the process. Correlation can use approximate (fuzzy) matching with human assistance, submitting probable matches for approval by administrator.

Even though correlation mechanism is very rich, the humble item-based correlator remains the most popular one. Therefore, there is simplified method to configure it. Definition of correlation item can be placed directly in account attribute definition:

<attribute>
    <ref>empno</ref>
    <correlator/>
    <inbound>
        <target>
            <path>$focus/personalNumber</path>
        </target>
    </inbound>
</attribute>

The correlator clause in the mapping marks the empno attribute for correlation. It will be matched with the mapping target, which is personalNumber user property. This notation is all that is needed to set up simple correlation scheme.

Correlation in samples
The simple method of correlator definition is not used in the samples. Not yet. We do not have our ExAmPLE configuration done yet, therefore we had to make few compromises. One of the compromises is username, we do have proper configuration for that yet. We are abusing employee number to stand in for real username. That means we need two inbound mappings for empno attribute. If we marked empno attribute for correlation, midPoint does not know which of the two target properties is the right one for matching. Therefore, we need to go the long way around in the samples, and specify the correlator using the full notation. At least for now.

We have dealt with correlation for source resources so far, such as the HR systems in our ExAmPLE scenario. However, there is a slightly surprising fact, that correlation is usually not necessary for source systems with a single authoritative source. In such cases, the users are created from the accounts originating in the source systems. The users are automatically linked to the originating accounts at the moment of creation, no correlation is necessary there. The correlation is necessary in case that there is another source of data which overlaps with the primary source, such as second source system or manual entry of user data using user interface. Even though correlation mechanism may not be necessary in some cases, it is a best practice to set it up for production deployments. Having it set up from the beginning lowers the probability that it will be missing when new data source is added, or in case that system administrator manually creates user during HR system outage. It makes the deployment more resilient.

Apart from source resources, correlation is absolutely essential for handling target resources. It is a very rare occasion to connect an empty target system to midPoint. Almost all the target systems that are connected to midPoint contain existing data, existing accounts that belong to users in midPoint. When such target system is connected to midPoint, we have to correlate the account to midPoint users to correctly set up the links. The correlation mechanism is the same, regardless whether it is used for source or target systems - except for one little difference. While we would have a suitable inbound mapping for source systems, we may not have that for target system. Target systems often use outbound mappings only. Therefore, we need to add the missing inbound mapping to be used to correlate target systems:

<attribute>
    <ref>employeeNumber</ref>
    <correlator/>
    <outbound>
        <strength>strong</strength>
        <source>
            <path>$focus/personalNumber</path>
        </source>
    </outbound>
    <inbound>
        <target>
            <path>$focus/personalNumber</path>
        </target>
        <evaluationPhases>
            <include>beforeCorrelation</include>
            <exclude>clockwork</exclude>
        </evaluationPhases>
    </inbound>
</attribute>

In this case, there are two mappings for employeeNumber attribute. The usual outbound mapping is used for ordinary provisioning, it populates the target attribute employeeNumber with value taken from personalNumber property of midPoint user. The other inbound mapping is used for correlation. It takes the employeeNumber account attribute to correlate it with personalNumber user property. However, we do not want that inbound mapping to be used to copy attribute values, as that could overwrite the authoritative information of personalNumber user property with incorrect data. Therefore, we are setting up this mapping to be correlation-only, using the evaluationPhases clause. The clause specifies that the mapping should be used to provide data for correlation (including beforeCorrelation phase), however it should not be used for ordinary synchronization processing (excluding clockwork phase).

Clever reader certainly looks thoughtful now. Why do we need both outbound and inbound mapping here? If we had outbound mapping only, we can clearly see how the employeeNumber attribute and personalNumber property relate to each other. As there is asIs mapping, it could work both ways, we could reverse it, and use it in inbound direction as well. Well, in theory, clever reader would be right. It could be possible. However, it is not implemented yet. MidPoint is smart, but not yet that smart. For the time being, both mappings are necessary.

It is important to realize, that correlation is always a database search. We have value of account attribute on one hand, and we have the entire user database on second hand. We have to find one specific user in the entire database, based on the value of correlation attribute. Looking for each individual user one-by-one is not a scalable method, we have to utilize database search capabilities to make it efficiently. Therefore, when choosing a correlation property, make sure it is a searchable (indexed) property.

Correlation before midPoint 4.6
MidPoint versions before 4.6 used correlation expressions, which were basically a search filters parametrized by expressions. This was simple and elegant mechanisms for many years, but it was also quite limited. MidPoint 4.6 introduced smart correlation mechanisms, and the old correlation expressions were deprecated.

Corrlators are efficient method to automatically link large number of accounts and users. However, correlators rely on reliable and clean data. If employee numbers were consistently recorded for every account, correlators can do all the work. If usernames were applied consistently across all systems, correlators are going to be a great tool. However, we all know that there is an exception to every rule. Practical data are not perfect, there are omissions, typos and all kind of errors. Therefore, in practical scenarios, there are always some accounts that cannot be correlated automatically. Manual action is necessary to correlate them. The usual approach is to use btn:[Change owner] button in account list to manually set up owner for an account.

Synchronization Situations and Reactions

Correlation mechanism can be used to find an owner for a new account. That is a part of the solution, but it is not a whole solution yet. If the owner is found, then the action is quite obvious: link the account to the user and proceed as usual. However, what to do if the owner is not found? This resource may be an authoritative resource, and therefore we want to create a new user based on the account. On the other hand, this may be a reconciliation with a target resource, and we have just found an illegal (orphaned) account. We probably want to deactivate such account in this case. Moreover, what to do if more than one owner is found? There are many scenarios, and the situation can become quite complicated. Therefore, midPoint has a concept of synchronization situations to make all the possible cases understandable and manageable.

Whenever midPoint deals with a change on an account, the situation of that account is determined. The situation reflects whether this account is already linked to the user, whether we know the candidate owner, whether we cannot determine the owner at all, and so on. Individual situations are explained in the following table.

Situation Description

linked

The account is properly linked to the owner. This is the normal situation.

unlinked

The account is not linked to the owner, but we know who the owner should be. Correlation mechanism told us who the owner is. MidPoint thinks that the link should exist, but it is not linked yet.

unmatched

The account is not linked, and we do not even know who the owner should be. The correlation mechanism has not returned any candidates.

disputed

The account is not linked, and there are several potential owners. The correlation expression returned more than one candidate.

collision

The account is linked to more than one owner. This should not happen under normal circumstances. This is usually caused by faulty customizations or software bugs.

deleted

There was an account, but it was deleted on the resource.

After synchronization situation is determined, midPoint continues by figuring out what a proper reaction is. The reaction may be quite clear for some situations (e.g. unlinked), but there is a lot of variability for other situations (e.g. unmatched and deleted). This variability is a reason that midPoint allows to set a reaction for each situation individually. There are several pre-defined reactions:

Action Keyword Description Usual situations

Synchronize

synchronize

Synchronize all the data, using the mappings. This does the "normal thing", it applies the usual processing. No changes in links are made, no users or accounts are created or deleted at this point.

linked

Add focus

addFocus

New midPoint user will be created, and it will be linked to the account. This is usually a reaction configured for authoritative resources, used in situation when a new account is discovered.

unmatched

Delete focus

deleteFocus

MidPoint user that owns the account will be deleted. This is usually a reaction configured for authoritative resources, used in situations when midPoint detects that an account was deleted.

deleted

Inactivate focus

inactivateFocus

MidPoint user that owns the account will be disabled. This is also used for authoritative resources. This is a milder reaction that deleting the user.

deleted

Delete resource object

deleteResourceObject

The account will be deleted. This is the usual reaction when illegal account is detected on non-authoritative resource.

unmatched

Inactivate resource object

inactivateResourceObject

The account will be disabled. Usually a milder reaction to an illegal account.

unmatched

If no reaction is explicitly configured for a situation, then midPoint does nothing. The situation is recorded in midPoint repository, but no other action is taken. This is part of midPoint philosophy: not to change the data unless an action was explicitly configured.

The reactions can be defined in the synchronization section of object type definition in resource configuration:

<resource>
    ...
    <schemaHandling>
        <objectType>
            <kind>account</kind>
            ...

            <synchronization>
                <reaction>
                    <situation>linked</situation>
                    <actions>
                        <synchronize/>
                    </actions>
                </reaction>
                <reaction>
                    <situation>deleted</situation>
                    <actions>
                        <deleteFocus/>
                    </actions>
                </reaction>
                <reaction>
                    <situation>unlinked</situation>
                    <actions>
                        <link/>
                    </actions>
                </reaction>
                <reaction>
                    <situation>unmatched</situation>
                    <actions>
                        <addFocus/>
                    </actions>
                </reaction>
            </synchronization>

        </objectType>
    </schemaHandling>
</resource>

This is a typical configuration for an authoritative resource. Most of the configuration is perhaps self-explanatory:

  • When all is fine (situation: linked) then do the usual thing (action: synchronize).

  • When the existing account is deleted from the resource (situation: deleted), then delete the user as well (action: deleteFocus).

  • When we find an account which should be linked but is not (situation: unlinked), then link it (action: link).

  • When a new account on the resource is found, and it does not have an owner (situation: unmatched), then create a new user (action: addFocus).

Perhaps the only thing that deserves special explanation is the synchronize reaction. In the linked situation, there is not much to do. Everything seems to be in order. However, there may still be attributes that are not set correctly. We may need to fully apply all the inbound and outbound mappings, to make sure that all the data are correct. This is what the synchronize reaction does, among many other things. In fact, we want to do the "synchronize" thing as part of all other reactions too. MidPoint does that implicitly, there is no need to add synchronize to every other reaction. However, we need to explicitly add synchronize to the linked situation, as it is the only reaction that there is. If we leaved the linked situation empty, midPoint does nothing. Putting synchronize action there tells midPoint do to the usual thing.

Unlink action
There is a pre-defined unlink action, in addition to other actions specified in the table above. However, the unlink action is not needed, as midPoint unlinks the account automatically when it is gone.

Synchronization reactions are used to configure resource for a variety of scenarios. The addFocus reaction is used to configure authoritative resources, other reactions may be used to configure target resources, semi-authoritative resources and so on. Synchronization reactions provide control on account level, while mappings provide control on attribute level. Together they can be used to implement variety of synchronization scenarios.

Synchronization Flow

Correlation and synchronization actions are just two pieces of much bigger synchronization puzzle. Connector provides account data, correlation looks for an owner, synchronization situations and reaction decide what to do with the account, and so on. Every mechanism has its place in a comprehensive synchronization flow.

Synchronization flow

The figure above illustrates the usual sequence of events during inbound synchronization:

  1. Account is created in the resource database.

  2. Appropriate identity connector is used to read account data.

  3. Account shadow is created in midPoint.

  4. Correlation mechanism is applied to determine account ownership (unless the account is already linked to a user).

  5. Synchronization situation is determined based on account ownership and state of the account.

  6. Appropriate reaction to the situation is determined based on resource configuration.

  7. Inbound mappings are evaluated to map account values to the user.

Description of the synchronization process is slightly simplified for clarity. There are also obvious deviations from this process. E.g. some inbound mappings are evaluated before correlation to provide data for correlation, inbound mappings are skipped in case that the user is about to be deleted, the mappings are also skipped if there is no synchronization reaction and so on. However, generally speaking, the sequence above is what usually happens during inbound synchronization.

When the data are reflected to user, the usual midPoint recompute process starts. This also includes evaluation of object templates, roles, policies and all the other things that will be covered in following chapters. This is the main part of midPoint processing, internally referred to as "clockwork". It is essentially the same processing as if user was modified by an administrator using midPoint user interface.

Synchronization before midPoint 4.6
When was midPoint originally created, synchronization configuration was a completely separate top-level part of resource definition. It made quite a lot of sense more than a decade ago, when midPoint was much simpler. However, as midPoint evolved, it became obvious that this is not the best way. It would make much more sense if the synchronization configuration was coupled with resource object type configuration in schema handling part. This was finally corrected in midPoint 4.6, which gives us the form of configuration that we have today.

Synchronization Tasks

Now we know how the inbound synchronization works: midPoint reads the account, then correlation is applied, situation determined and reaction executed. However, we have not yet discussed the details of the very first step: how does midPoint actually gets account data? Nothing happens without a reason, therefore there must be some active component in midPoint that looks for the new, changed and deleted accounts. That component is a synchronization task.

MidPoint task is an active process that runs inside midPoint server. This is the first time that we encounter the concept of a task, but it is definitely not the last one. Tasks are used for numerous purposes in midPoint. They are used to track long-running operations and actions that work on large sets of objects (bulk actions). There are tasks that execute cleanup jobs, compile reports and provide variety of other functions. The concept of a task is a very powerful and flexible one. Tasks can be used to track execution of a short one-off operations. Tasks can be used to execute scheduled actions in regular intervals. Tasks can be used to track long-running processes. We will be using tasks in almost every chapter of this book.

Tasks are used as an active component to "run" almost all synchronization mechanisms:

  • Reconciliation task is listing all the accounts in a specific resource. The task executes reconciliation process for every account that is found. This essentially means that midPoint computes how that particular account should look like, and then the computed values are compared with real account attributes. As reconciliation is processing all accounts one-by-one and recomputes all the data, it is quite a heavyweight task. This task is usually scheduled for regular execution using quite a long execution interval (days or weeks).

  • Live synchronization task is looking (polling) for changes in a specific resource. The task will look for accounts that were created, modified or deleted recently. The task will get a description of the change (delta), and pass that to midPoint synchronization mechanisms. Live synchronization is designed to be fast and efficient, to provide almost real-time reaction to changes. This task is almost always scheduled for regular execution in very short intervals (minutes or seconds).

  • Import from resource task is listing all the accounts from a specific resource. The task will pretend that the accounts were just created. This usually motivates midPoint to create users based on those accounts, or link these accounts to existing users. This task is usually not scheduled, it is almost always executed manually as needed.

Each type of synchronization task is detecting changes using a different mechanism. However, once the task detects the change or reads the account, then the processing is the same for all tasks. All the tasks lead to the same algorithms based on the same configuration and policies. Therefore, it does not matter whether it has all started in reconciliation or live synchronization task, it will all end up in the same correlation-situation-reaction-mapping flow.

The tasks are necessary to initiate the synchronization. They are the active part, the spark that starts the synchronization process. Without the tasks the synchronization does not really work. There are ways the synchronization can "happen" even without a task, e.g. as a reaction to user interface operation, or if a new account is discovered during an unrelated operation. Despite that, practical deployments need at least one synchronization task to work properly. This task takes care of the vast majority of synchronization cases.

Strictly speaking, a task is quite a strange kind of animal. Tasks have their data and configuration as most other midPoint objects have. Unlike other objects, tasks are active, they run. Therefore, there are CPU threads associated with the tasks when the tasks are running. There are mechanisms to monitor task progress. The tasks need to be cluster-aware, they have to fail over to a different midPoint node if one node fails. The tasks are quite complex, and they may be a bit tricky to handle. Fortunately, midPoint is making task handling reasonably simple. Tasks are represented as ordinary midPoint objects. Therefore, tasks can be imported to midPoint in XML/JSON/YAML form as any other object. Tasks can be easily edited in their XML/JSON/YAML form to change the scheduling, modify the parameters, and so on. Of course, there are some special functions that only the tasks have (such as suspend and resume). Such functions cannot be directly controlled using the XML/JSON/YAML format. However, the vast majority of task management can be done using the very same methods that are used to control other midPoint objects.

Tasks can be created by taking the XML/JSON/YAML file and importing that to midPoint. That is the way synchronization tasks are often managed. When an XML-formatted resource definition is created, then there is often an associated synchronization task. Which means that both resource and all the necessary synchronization tasks can be imported together. Synchronization tasks can also be created from midPoint user interface. They are usually created by using special-purpose menu:Defined tasks[] menu item in resource detail pages.

Synchronization tasks

Once the synchronization tasks are created, they can be managed in the same way as other tasks are managed: in the menu:Server tasks[] part of the midPoint user interface.

Synchronization Example: HR Feed

This section describes complete working example that feeds HR data into midPoint. The HR system used by ExAmPLE company is an old and complex thing. Therefore, the easiest integration method is to use structured text exports. The system can export employee data in a form of a comma-separated text file (CSV). The HR system is set up to export fresh data every night. MidPoint takes this export file, crunches the content, and updates the data about users.

This configuration is done in three steps. First, we create a simple setup to import the data into midPoint. This is an operation that is executed only once. Then the configuration is updated to run scheduled reconciliation task. Reconciliation compares all the data records every time, and it makes any necessary updates. Even though this method would be perfectly acceptable for a company of this size, we are still going set up a live synchronization task, to make synchronization lighter and faster.

The core of the configuration is contained in a single resource definition file. Following paragraphs explain individual parts of the file. There are few additional configuration files for reconciliation and live synchronization tasks. Simplified XML notation is used for clarity. The complete file in a form that is directly usable in midPoint can be found at the same place as all the other samples in this book (see Additional Information chapter for details).

The resource representing HR system is configured as data source. It is used to "pull" the data inside midPoint. However, as we have described previously, there is no fundamental difference between source and target resources in midPoint. Therefore, this HR resource starts in entirely ordinary way. There is a reference to the CSV connector and the connector configuration:

resource-csv-hr.xml
<resource oid="03c3ceea-78e2-11e6-954d-dfdfa9ace0cf">
    <name>HR System</name>
    <connectorRef>...</connectorRef>
    <connectorConfiguration>
        <configurationProperties>
            <filePath>/var/opt/midpoint/resources/hr.csv</filePath>
            <encoding>utf-8</encoding>
            <fieldDelimiter>,</fieldDelimiter>
            <multivalueDelimiter>;</multivalueDelimiter>
            <uniqueAttribute>empno</uniqueAttribute>
            <passwordAttribute>password</passwordAttribute>
        </configurationProperties>
    </connectorConfiguration>
    ...

The next section is schema handling configuration. That is where it becomes slightly more interesting. The schema handling section contains inbound mappings for HR account attributes:

resource-csv-hr.xml
    ...
    <schemaHandling>
        <objectType>
            <displayName>Default Account</displayName>
            <default>true</default>
            <delineation>
                <objectClass>AccountObjectClass</objectClass>
            </delineation>
            <focus>
                <type>UserType</type>
                <archetypeRef oid="00000000-0000-0000-0000-000000000702"/>
            </focus>
            ...

The schema handling section starts with the usual definitions of account object type. This is a definition of Default account, using the only object class that our CSV-based HR connector provides. We have seen that before, in previous chapter. However, there is a new part, definition of focus. The delineation and most other parts of schema handling definition refer to account or other resource objects. In midPoint parlance those are seen as projections, the "spoke" part of hub-and-spoke synchronization topology. However, the focus part is different. It refers to the objects in midPoint repository, objects that are at the center of the synchronization topology, the focal object. In our case, user is our focal objects, as we are going to synchronize HR accounts with midPoint users. The other part of the focus definition specifies an archetype, which defines finer characteristics of focal object. In this case it points to Person archetype. User objects that are going to be created during synchronization will have this archetype automatically assigned. We will learn more about archetypes later, in Archetypes chapter.

Next part of the definition specifies properties and mappings of account attributes:

resource-csv-hr.xml
            ...
            <attribute>
                <ref>empno</ref>
                <displayName>Employee number</displayName>
                <inbound>
                    <target>
                        <path>$focus/name</path>
                    </target>
                </inbound>
                <inbound>
                    <target>
                        <path>$focus/personalNumber</path>
                    </target>
                </inbound>
            </attribute>
            <attribute>
                <ref>firstname</ref>
                <displayName>First name</displayName>
                <inbound>
                    <target>
                        <path>$focus/givenName</path>
                    </target>
                </inbound>
            </attribute>
            <attribute>
                <ref>lastname</ref>
                <displayName>Last name</displayName>
                <inbound>
                    <target>
                        <path>$focus/familyName</path>
                    </target>
                </inbound>
            </attribute>
            ...

The account attribute empno is mapped to midPoint user properties name and personalNumber. Account attributes firstname and lastname are mapped to givenName and familyName properties respectively. This is perhaps self-explanatory.

The next part of the configuration specifies mappings for activation and credentials:

resource-csv-hr.xml
            ...
            <activation>
                <administrativeStatus>
                    <inbound/>
                </administrativeStatus>
            </activation>

            <credentials>
                <password>
                    <inbound>
                        <strength>weak</strength>
                        <expression>
                            <generate/>
                        </expression>
                    </inbound>
                </password>
            </credentials>
            ...

Activation is a very specific concept in midPoint, controlling whether accounts is active (enabled or disabled) among other things. MidPoint knows activation attributes and their meaning, they are pre-defined in midPoint schema. Therefore, there is no need to specify a lot of details. That makes activation mapping a very simple thing. The configuration specifies that the administrative status should be mapped in the inbound direction. That is it.

However, the mapping for credentials needs a bit of explanation. What midPoint sees as HR accounts are not exactly accounts, they are usually just records in the HR database. Nobody is using these HR records to log into the HR systems. Therefore, there is no password associated with them. Yet, we need a password for the users in midPoint. Therefore, we are going to generate the passwords. For that purpose, we are going to use the weak mapping with generate expression that was explained above.

Next section specifies correlation settings. The ownership of the accounts that are not already linked is determined by the correlation mechanism.

resource-csv-hr.xml
            ...
            <correlation>
                <correlators>
                    <items>
                        <item>
                            <ref>personalNumber</ref>
                        </item>
                    </items>
                </correlators>
            </correlation>
            ...

In this case, personalNumber user property is used for correlation. MidPoint knows that personalNumber is mapped from empno account attribute, therefore midPoint compares values of personalNumber user property and empno account attribute. If the values match then the user is considered to be an owner of the account.

The mappings are undoubtedly important part of synchronization configuration. The mappings specify how are the account data reflected to midPoint user. However, the mappings do not specify whether the accounts should be created or deleted. Mappings control the data, but they do not control the lifecycle. It is the next configuration section that makes this resource really authoritative:

ressource-csv-hr.xml
            ...
            <synchronization>
                <reaction>
                    <situation>unmatched</situation>
                    <actions>
                        <addFocus/>
                    </actions>
                </reaction>
                <reaction>
                    <situation>unlinked</situation>
                    <actions>
                        <link/>
                    </actions>
                </reaction>
                <reaction>
                    <situation>linked</situation>
                    <actions>
                        <synchronize/>
                    </actions>
                </reaction>
                <reaction>
                    <situation>deleted</situation>
                    <actions>
                        <deleteFocus/>
                    </actions>
                </reaction>
            </synchronization>
            ...

Given the information in this chapter, this configuration should be quite easy to read. This a typical configuration for authoritative resource. If there is a new account on the resource, and we do not have an owner (situation: unmatched), then we create a new user (action: addFocus). If there is a new account for which we can find existing owner (situation: unlinked), then simply link it (reaction: link). If the account is linked already (situation: linked), then we just synchronize the data. In fact, we synchronize data for all the other situations as well. Except the last one. If the account is deleted in the HR system (situation: deleted), then we want to delete midPoint user as well (reaction: deleteFocus). As the user gets deleted there is no point in synchronizing the data. MidPoint knows that, and it skips application of mappings.

There is one more little detail in this resource definition:

resource-csv-hr.xml
    ...
    <projection>
        <assignmentPolicyEnforcement>none</assignmentPolicyEnforcement>
    </projection>
    ...

This is a setting that adjusts the behavior of midPoint assignments. As was already mentioned, all resources in midPoint are created equal. The source resources must follow the same rules as target resources. One of the fundamental rules of midPoint is that there should not be any account without a specific reason to exist. In midPoint terminology, every account exists because there is an assignment that justifies its existence. While this approach is exactly what we want for the vast majority of (well behaving) resources, it is not exactly ideal for resources that are pure data sources. Those resources work the other way around. The HR account is in fact a cause for midPoint user existence, not its effect. Therefore, there is really useful assignmentPolicyEnforcement setting that controls the behavior of assignments. This setting is used in a variety of scenarios, mostly for data migration, and to tame resources that just won’t behave in a civilized manner. However, in this case, the setting is used to turn off the assignment enforcement for this resource entirely. As this resource is an authoritative source, the assignment enforcement does not make much sense. Behavior of this resource is defined by the synchronization section of resource configuration.

Resource configuration is complete now. This configuration sets up the connector, mappings, correlation and synchronization policies. The configuration is the same for all the synchronization flavors: import, reconciliation and live sync - they will all use the same settings. When it comes to configuration, the only difference between those synchronization flavors is the way how the synchronization tasks are set up. If an import task is set up, then import of resource accounts will be executed. If reconciliation task is set up, the reconciliation will be executed. It is all in the tasks. Synchronization tasks can be easily set up using those convenient buttons in the user interface. However, we like to make our lives a bit painful in our part of the world. Therefore, we are going to go hardcore, and we import the tasks in the XML form.

First task is an import task. This task lists all the accounts in the HR CSV file. The task pretends that each of the accounts was just created. If the task is executed for the first time, then resulting situation of the accounts is going to be either unmatched or unlinked. Therefore, the task creates new midPoint users, or links the accounts to existing users.

task-hr-import.xml
<task oid="fa25e6dc-a858-11e7-8ebc-eb2b71ecce1d">
    <name>HR Import</name>
    <assignment>
        <!-- Import task archetype -->
        <targetRef oid="00000000-0000-0000-0000-000000000503" type="ArchetypeType"/>
    </assignment>
    <ownerRef oid="00000000-0000-0000-0000-000000000002"/>
    <executionState>runnable</executionState>
    <schedule>
        <recurrence>single</recurrence>
    </schedule>
    <activity>
        <work>
            <import>
                <resourceObjects>
                    <!-- HR Resource -->
                    <resourceRef oid="03c3ceea-78e2-11e6-954d-dfdfa9ace0cf"/>
                    <kind>account</kind>
                </resourceObjects>
            </import>
        </work>
    </activity>
</task>

This is a very basic structure of a task. Similarly to all midPoint objects, a task has a name. Then there is an assignment of Import task archetype. We will describe archetypes later. For now, it is only important to know that this classifies the task as import task, so the user interface can place the task in proper categories. Task needs definition of an owner. The owner is a user that is executing the task. This is important, because authorizations of task owner determine what the task is allowed to do. This is also the identity that will be recorded in the audit log. In this case administrator is owner of this task. Task execution state tells whether the task is running, it is suspended or finished. Tasks are often executed periodically, therefore they need a schedule. In this case, we start with task that is executed just once, to test the configuration. Hence the single recurrence, which specifies that the task runs only once. Then there is definition of activity. Activity specifies what the task really does. In this case the activity specifies that this is a synchronization task which imports accounts from the resource. The resource is specified by the resourceRef reference in the activity specification. This points to our HR resource.

As task execution state is set to runnable, midPoint tries to execute the task when the definition is imported. That means that import of accounts from the HR resource starts immediately. Progress of the task can be monitored in the Server tasks section of midPoint user interface. The import task is not a recurring task, it will run only once. If you need to re-run the task, you can do that from midPoint user interface. However, the task will not get executed again, unless you explicitly tell midPoint to do so. This is very typical behavior for import tasks, they are usually executed only when a new resource is connected to the system. Once everything is set up, correlated and linked, then the import task is not needed any more.

A clever reader may ask what happens when the import task is executed more than once. The answer is simple: not much. Even if the task pretends that the accounts were just created, midPoint is not fooled easily. In fact, it is hard to believe that the account was just created if midPoint already has shadow for that account, and it is linked to a user, isn’t it? Therefore, midPoint is going to stay calm and carry on. It keeps the user, it keeps the shadow and also the link between them. If there is any change in the account attribute, the change will be reflected to the user. That is it. No big drama here.

Import task gets the data from the resource into midPoint. As import is not a recurring task, it does not keep the data synchronized. Import tasks are not designed to do so. Fortunately, there are other tasks that are designed for continuous synchronization. Reconciliation task is one of these. Reconciliation task lists all the accounts on a resource and compares that with data in midPoint.

task-hr-recon.xml
<task oid="bbe4ceac-a85c-11e7-a49f-0f5777d22906">
    <name>HR Reconciliation</name>
    <assignment>
        <!-- Reconciliation task archetype -->
        <targetRef oid="00000000-0000-0000-0000-000000000501" type="ArchetypeType"/>
    </assignment>
    <ownerRef oid="00000000-0000-0000-0000-000000000002"/>
    <executionState>runnable</executionState>
    <schedule>
        <recurrence>recurring</recurrence>
        <cronLikePattern>0 0 1 ? * SAT</cronLikePattern>
        <misfireAction>executeImmediately</misfireAction>
    </schedule>
    <activity>
        <work>
            <reconciliation>
                <resourceObjects>
                    <resourceRef oid="03c3ceea-78e2-11e6-954d-dfdfa9ace0cf"/>
                    <kind>account</kind>
                </resourceObjects>
            </reconciliation>
        </work>
    </activity>
</task>

Definition of a reconciliation task is almost the same as the definition of import task. However, there are crucial differences. First of all, there is different activity. This is what makes this task a reconciliation task. Then, the task is recurring. This means that midPoint will repeat execution of the task. Therefore, there is also execution schedule, so the server knows when to execute the task. Reconciliation tasks are usually resource-intensive, therefore we usually want to execute them at a very specific off-peak times. For that reason the execution schedule is defined using a cron-like pattern. UNIX-friendly readers will be surely familiar with this. The format is:

seconds minutes hours day-of-month month day-of-week year

The string 0 0 1 ? * SAT means that this task will be executed every Saturday at 01:00:00am. There is also definition of misfire action. Misfire is a situation when the server is down at the time when the task is supposed to run. In this case, if the server is down in the early hours of Saturday, this task will be executed as soon as the server starts up.

Reconciliation is a real workhorse of identity management. It can be used for almost any resource. It is very reliable. It is often used to fix data problems, apply new policies, look for missing accounts, detect illegal accounts and so on. It is indeed a really useful tool. Yet, it has its downside. Reconciliation iterates through all the accounts, it recomputes all the applicable policies for every account, one-by-one. Therefore, it may be quite resource-intensive. It may be even quite brutal if the policies are complex, user population is high and the resources are slow. This can take hours or even days in extreme cases. Even for smaller deployments, reconciliation is not entirely easy. The problem is not in midPoint. MidPoint can be usually scaled up to handle the load. However, listing all the accounts often may put unacceptable load on the resources, the source and target systems. Therefore, reconciliation is not executed often. Daily, weekly or even monthly reconciliation seems to be a common approach.

Reconciliation is reliable, but it is not entirely what we would call "real-time". Of course, midPoint has a faster alternative.

Live synchronization is the way to go for real-time synchronization - or rather almost real-time synchronization. Practical latencies for live synchronization are in the range of seconds or minutes, which is fast enough for most practical cases. Live synchronization is also quite resource-efficient. Overall, it is much faster and much lighter than reconciliation. Unfortunately, live synchronization is not available for all resources. Live synchronization depends on the ability to get recent changes from the resource in a very efficient way. Therefore, it is only available for resources that record the changes. The specific mechanism to record the changes may vary from resource to resource. It may be as basic as a simple modification timestamp, or it may be a complex real-time change log. The mechanism has to be good enough for the connector to discover recent changes, and it must be efficient enough for the connector to do that every couple of seconds. If such mechanism is available, and the connector knows how to use it, then setting up live synchronization is easy. All that is needed is live synchronization task.

task-hr-livesync.xml
<task oid="7c57adc2-a857-11e7-83ac-0f212d965f5b">
    <name>HR Live Synchronization</name>
    <ownerRef oid="00000000-0000-0000-0000-000000000002"/>
    <executionState>runnable</executionState>
    <schedule>
        <recurrence>recurring</recurrence>
        <interval>10</interval>
    </schedule>
    <activity>
        <work>
            <liveSynchronization>
                <resourceObjects>
                    <resourceRef oid="03c3ceea-78e2-11e6-954d-dfdfa9ace0cf"/>
                    <kind>account</kind>
                </resourceObjects>
            </liveSynchronization>
        </work>
    </activity>
</task>

This task definition should be easy to understand by now. There is a different activity that makes this a live synchronization task. There is also a different type of scheduling. We do not want to execute this task at a specific time. We rather want to execute it all the time, at regular intervals. In this case the interval is set to 10 seconds. That is all. We have live synchronization running. If the HR CSV file is changed, the changes will get automatically processed by midPoint.

Setting up synchronization flavors such as reconciliation or life synchronization is just a matter of setting up the tasks. The rest of the configuration is the same for all flavors. Therefore, it is very easy to run both live synchronization and reconciliation for the same resource - just create two tasks. In fact, this is quite a common setup. Live synchronization is used to get the changes quickly and efficiently. Reconciliation is used to make sure all the changes were processed and that the policies are applied consistently.

Now we have the HR feed up and running. However, there are still few issues. A clever reader would surely notice that this is not a very good HR resource. MidPoint users created from this HR feed have given name and family name, but the full name field is empty. Do not worry. We will sort that out in later chapters, with the help of object template. Also, the users have employee number as their username. This may be in fact a very good approach for some deployments as it avoids the need to rename accounts. However, it is not a very user-friendly approach. Therefore, most deployments would like to generate more convenient usernames. This is easy to do with midPoint, and we will also address that later. There is still a lot of things to learn before we get to a complete synchronization set up.

HR Feed Recommendations

All resources are created equal in midPoint. However, source resources almost always have a slightly special standing. Even though midPoint mechanisms are the same for all resources, the data coming from the sources often have significant impact on the entire solution. There is this traditional computer engineering wisdom: garbage in, garbage out. An error in data feed may cause a lot of problems everywhere. Therefore, it is important to get the data sources right. This is usually one of the first steps in an identity management project.

Unfortunately, source data feed is usually quite difficult to set up correctly - and it is almost impossible to get it right at the first try. There may be good old configuration problems, which are usually easy to fix. There may be data compatibility problems, such as presence of non-ASCII national characters where they are not expected. Worst of all, source data may be of poor quality, there may be inconsistencies, typos, the data may be out of date, not reflecting the reality very well. These problems are the most difficult to correct, as the right way to correct them is to modify the data at its source - in the source database, HR system or spreadsheet exchanged by e-mail. That takes time, meetings, mail messages, management decisions, processes, excuses, delays and huge amount of patience. Therefore, setup of a data source is usually an iterative process. The process usually goes like this:

  1. Set up initial source resource definition based on the information you have. Set up connector and test connection. Check that you can see the accounts. Set up mappings and synchronization policy.

  2. Test the import process on a couple of individual accounts. Navigate to the resource details pages, click on menu:Accounts[] tab to list accounts, choose an account and click on the small btn:[Import] button in the table row. Import of that individual account starts immediately. Just that one account. It is easier to see the errors (see step 6) by using this method.

  3. Fix any errors that you see and repeat step 2.

  4. Create an import task and run import of all accounts.

  5. Examine task errors. You can use task details page to get the summary.

  6. If there are no errors, then examine the users. If everything seems right then it is time to congratulate yourself. You have a good import. However, this is unlikely to happen on the first few attempts.

  7. You will probably need to have a look into system logs to learn the details of individual import failures. MidPoint heavily relies on logs for detailed error analysis. See the Troubleshooting chapter of this book to learn how to adjust log levels and how to get understand the log messages.

  8. Some errors are likely to be caused by the errors in your mappings and policies. These are usually easy to fix. However, there are usually worse errors as well – errors caused by wrong or unexpected input data. The right way would be to fix the data. However, that is not always possible (in fact it is almost never a feasible option for a quick fix). Fortunately, most of the input data errors can be fixed (read: "worked around") in midPoint with a bit of ingenuity. Just use the power of the mappings. For example, clean up the unexpected characters, white space or data formats using scripting expressions.

  9. Rinse and repeat. If the errors you get are not severe then you may simply re-run the import task. This often works just fine. However, if the problem was in a mapping that completely ruined all the data then it is perhaps best to start with a blank slate. We are all just humans and this situation happens quite often, especially in the beginning while you are still learning. Therefore, there is a special feature to help you out. Navigate to menu:Configuration[Repository Objects]. There is a small unassuming expand button in the top-right part of the screen. That button opens a context menu. Select menu:Delete all identities[] item. That is what we lovingly call "laxative button". A brief dialog will pop up asking you to specify which identities exactly are to be deleted (users, shadows, …​). This is a very convenient way how to get back to a black slate, but keep all the configuration (resources, templates, tasks).

  10. Goto step 2. Repeat until done.

If the initial identity management deployment step includes an HR feed, we strongly recommend to start with that HR feed. It is significant benefit to have authoritative HR data in the midPoint to start with. It is usually easier to correlate other resources to midPoint users later on, if the users were created from a reasonably reliable HR data. Also, it will usually take some tweaking to get the HR import right. The possibility to easily clean up midPoint and get to a clean slate is extremely useful. However, this "wipeout" approach is possible only if the HR feed is the first resource that is connected to midPoint.

A clever reader would notice, that we assumed that the source feed is taken from a CSV file. This is indeed the case in many deployments. The CSV file is usually produced as an automatic scheduled export of HR system data, running every night. If a new employee or contractor is about to join the company, there is usually no hurry. This information is entered into the HR system at least a few days in advance, therefore daily CSV export is perfectly acceptable. However, there may be cases when we want a faster response. Maybe we do not even want additional burden of dealing with CSV exports. Of course, there is a solution. In theory, any connector can be used for source resource. There are specialized connectors that are taking data directly from the HR system. For example, there is a connector for Oracle HCM system. Unfortunately, there is no connector that can take data from SAP HR system yet.

Synchronization and Provisioning

Synchronization and provisioning are intimately related. Everything that we have explained about provisioning in the previous chapter also applies to synchronization. In fact, provisioning and synchronization are just applications of the same basic mechanisms. Provisioning starts with modification of a user. Synchronization starts a bit earlier: inbound mappings are used to map values from source system to the user. The result of inbound mapping evaluation is modification of the user object. According to midPoint principles, it does not matter how the user was modified. The reaction is the same: accounts are provisioned, modified or deleted.

The synchronization (inbound processing) and provisioning (outbound processing) usually happen in one seamless operation. For example, the HR connector detects update in the last name of an employee. That modification is applied to midPoint user, therefore the family name of midPoint user is updated. The operation continues by evaluating all templates, roles and outbound mappings. The outbound mappings usually map the family name change to the resource attributes. Therefore, the resource accounts linked to the user are immediately updated. All of that happens in a single operation. That is how midPoint works. MidPoint is not a human. It never procrastinates (unless explicitly instructed do to so). MidPoint does not postpone the operation for later if the operation can be executed immediately. MidPoint tries to get the data right on the first try. Therefore, there are no specialized propagation or provisioning tasks that you might know from older identity management systems. MidPoint does not need them.

There are other advantages in doing everything in one operation. It is all one operation, therefore midPoint knows all the details: what was the cause, what is the effect, what exactly has been changed. Such context is extremely important for troubleshooting. Some identity management systems decouple the cause from the effect. Such a divided approach may have its advantages, but it is an absolute nightmare when an engineer needs to figure out why a certain effect happened. For that reason, midPoint has both the cause and the effects bundled together in a single operation. Therefore, it is much easier to figure out what is going on. Having the cause and effect connected in one operation makes it possible to neatly record entire operation in the audit trail. Then there is another huge advantage: midPoint knows exactly what has been changed. This means that midPoint does not only know the new value of a property. MidPoint also knows the old value and values that were added or removed. This is a complete description of the change that we call a delta. This is recorded at the beginning of the operation, and propagated all the way until the operation is done. Therefore the mappings can be smart. This approach enables a lot of interesting behavioral patterns. For example, it is quite easy for midPoint to implement the "last change wins" policy. In this case midPoint will simply overwrite only those attributes that are really changed in operation. MidPoint can leave other values untouched. In fact, this is the default behavior of midPoint. It is a very useful behavior during deployment of a new identity management system.

Careful processing of the operations allows configurations that are not feasible with older identity management systems, e.g. a resource that is both a source and a target. In fact a lot of identity management systems can have resource that is both a source and a target - as long as it is a source for one attribute and a target for another attribute. However, midPoint can live with a resource where the same attribute is both a source and a target. In fact there may be many sources and many targets for the same property at the same time. This is a very useful configuration, indeed. Just think about telephone number property. It is usually something that the user sets up himself. This may be set up by some kind of specialized self-service, it may be updated by a call center call, the user may update that in his Active Directory profile …​ there are many ways how this information is changed. Yet, we want this property to be consistent. We want telephone number to be the same everywhere. We do not care where it was changed. We just want to propagate the last change from anywhere to all the other systems. MidPoint can easily do this. Just specify both inbound and outbound mappings for the same attribute:

<attribute>
    <ref>mobile</ref>
    <outbound>
        <source>
             <path>$focus/telephoneNumber</path>
        </source>
    </outbound>
    <inbound>
        <target>
             <path>$focus/telephoneNumber</path>
        </target>
    </inbound>
</attribute>

In this case, the change in user property telephoneNumber is propagated to the account attribute mobile (outbound change). However, also a change in the account attribute mobile is propagated back to user property telephoneNumber (inbound change). Last change wins.

A clever reader certainly grumbles something about infinite loops now. No need to not worry here. MidPoint can see complete operation context, both inbound and outbound sides. Therefore, midPoint knows when to stop processing the operation. There are even mechanism how to avoid loops caused by connectors detecting changes caused by the connector itself. MidPoint will break those loops automatically.

Mapping and Expression Tips and Tricks

Mappings and expressions provide a very powerful mechanism. In fact, most of midPoint initial configuration is about setting up correct mappings. However, with great power comes great responsibility, and mappings may look a bit intimidating at a first sight. Fortunately, there are some tips and tricks that make the life with mappings and expressions a bit easier.

Most mappings are aware of the context in which they are used. Therefore, paths of mapping sources and targets can be shortened - or even left out entirely. Activation and credential mappings used in the HR feed example are the obvious cases. Yet, even paths in ordinary mappings may be shortened. For example take the outbound mapping source:

    <outbound>
        <source>
             <path>$focus/telephoneNumber</path>
        </source>
    </outbound>

As the mapping knows that its source is a focus (user) the definition may be shortened:

    <outbound>
        <source>
             <path>telephoneNumber</path>
        </source>
    </outbound>

Typical midPoint deployment has tens or hundreds of mappings. Deployments with thousands of mappings are definitely feasible. There are two things that can make maintaining the mappings easier. Optionally, you can specify the mapping name. Mapping name appears in the log files and some error messages. It may be easier to identify which mapping is causing problems, or it may help locate the trace of mapping execution in the log file. It is strongly recommended to provide name for your mappings. Mapping can also have a description. The description can be used as a general-purpose comment or a documentation explaining what the mapping does.

<attribute>
    <ref>mobile</ref>
    <outbound>
        <name>ldap-mobile</name>
        <description>
            Mapping that sets value for LDAP mobile attribute based on
            user’s telephone number.
        </description>
        <source>
             <path>telephoneNumber</path>
        </source>
    </outbound>
</attribute>

Mappings can become quite complex. There may be multi-line scripting expression in the mapping, it may not entirely obvious what is the input and output. Therefore, each mapping and each expression have an ability to enable tracing using the trace element:

<attribute>
    <ref>mobile</ref>
    <outbound>
        <name>ldap-mobile</name>
        <trace>true</trace>
        <source>
             <path>telephoneNumber</path>
        </source>
        <expression>
            <trace>true</trace>
            <script>
                <code>...</code>
            </script>
        </expression>
    </outbound>
</attribute>

If tracing is enabled, then the mapping or expression execution is recorded in the log files. Tracing can be enabled at both mapping level and expression level. Mapping tracing is shorter. It provides overview of the mapping inputs and outputs. Expression-level tracing is much more detailed.

However, even this level of tracing may not be enough to debug expression code. Therefore, there is a special expression function for logging. Arbitrary messages may be logged by script expression code:

    <expression>
        <script>
            <code>
                ...
                log.info("Value of foo is {}", foo)
                ...
            </code>
        </script>
    </expression>

Generally speaking, troubleshooting of mappings may be quite difficult as it is often intertwined with midPoint internal algorithms. Still, there are ways how to do it. The Troubleshooting chapter provides much more details on this.

Expression Functions

Expressions in general, and scripting expressions in particular, are the place where most midPoint customization takes place. Scripting expressions are able to execute any code in a general-purpose programming language. Script can transform the data in any way, or it can execute any function. Quite naturally, there are functions that are frequently used in the scripts. Therefore, midPoint provides convenient scripting libraries full of useful methods ready to be used in scripting expressions.

There are two built-in scripting libraries that are used very often:

  • Basic script library provides very basic functions for string operations, object property retrieval, etc. These are simple, efficient stand-alone functions. These functions can be used in every expression.

  • MidPoint script library provides access to higher-level midPoint functions contain identity-management-specific and midPoint-specific logic. This library can be used to access almost all midPoint functionality.

The libraries are designed to be very easy to use from the scripting code. While the specific details how to invoke the library depend on the scripting language, the libraries are usually accessible by the use of basic and midpoint symbols. Function norm() from the basic library can be invoked in a Groovy script like this:

    <expression>
        <script>
            <code>
                ...
                basic.norm('Guľôčka v jamôčke!')
                ...
            </code>
        </script>
    </expression>

Invocation of the libraries from JavaScript and Python is almost the same, and we are sure that a clever reader will have no trouble figuring that out. What is more difficult to figure out is which functions the libraries provide. For that purpose there is a page in midPoint docs that lists all the libraries and this page also has a link to library function documentation. Look for Script Expression Functions page in midPoint docs.

Only two libraries were mentioned in this section so far. However, this is not a whole story. A clever reader has certainly figured out that the logging function described in previous section is also a scripting library - and there may be more libraries in the future.

Resource Capabilities

The systems that midPoint connects to are not created equal. In fact, those systems significantly differ in their capabilities. Most systems can create accounts. However, not all of them can delete accounts. There are systems that keep the accounts forever, the accounts can be only permanently disabled. Yet another systems cannot enable or disable accounts. While most systems support password authentication, other system do not. There is a lot of natural diversity in the provisioning wilderness. The connector may introduce additional limitations as well. Even if target system supports a particular feature, connector may not have appropriate code to use it. MidPoint needs to take all these differences into consideration when executing synchronization and provisioning operations.

MidPoint refers to these features of the systems and connectors as resource capabilities. Although capabilities may look quite complex, they are essentially just a list of things that a connector and resource can do. MidPoint is aware of the resource capabilities, and their limitations. Therefore, midPoint can work with resource data correctly. E.g. midPoint will not try to modify account on a read-only resource.

Capabilities are usually automatically discovered by midPoint, and everything just works out of the box. There is usually no extra work to maintain the capabilities. Yet, sometimes there is a need to tweak the capabilities a bit. Maybe the connector cannot detect resource capabilities well enough. Maybe there is a read-only resource, but the connector has no way of knowing this. In that case, the write capabilities have to be manually disabled in midPoint. For that reason there are two sets of capabilities:

  • Native capabilities are capabilities detected by the connector. Those are always automatically generated by midPoint. Those capabilities should not be modified by administrator.

  • Configured capabilities are the capabilities modified by the administrator. Configured capabilities are used to override native capabilities. Configured capabilities are usually empty, which means that only native capabilities are used.

There are many ways to tweak the capabilities by the administrator. Yet, there is one case that is particularly interesting for synchronization and provisioning: simulated activation capability.

MidPoint connectors can be tailored specifically for a particular system. E.g. there are often connectors that are developed specifically for one custom enterprise application. At the other side of the spectrum are generic connectors, that can fit a wide variety of systems and applications. LDAP, CSV and database table connectors are examples of such generic connectors. Such connectors are so useful that they are used in almost every midPoint deployment. However, there is no standardized way to disable an account in database table or a CSV file. Various columns and various values are used to represent account activation status. Quite surprisingly, there is no standardized way to disable an account in LDAP directory either. That is bad news for midPoint. MidPoint takes a significant advantage from knowing whether account is disabled or enabled. We had to do something about this "disable ambiguity". And we did.

There is way to tell midPoint which attribute and what values are used to represent account activation status. Configured activation capability is used for that purpose:

    <capabilities>
        <configured>
            <cap:activation>
                <cap:status>
                    <cap:attribute>ri:active</cap:attribute>
                    <cap:enableValue>true</cap:enableValue>
                    <cap:disableValue>false</cap:disableValue>
                </cap:status>
            </cap:activation>
        </configured>
    </capabilities>

Configured capability above specifies resource attribute active as the attribute that controls account activation status. If this attribute is set to value true then the account is enabled. If the attribute is set to value false then the account is disabled. That is it. Once this configured capability is part of resource definition, then midPoint will pretend that the resource can enable and disable accounts. Attempt to disable account will be transparently translated to modification of active attribute. Moreover, it also works the other way around. If an account has attribute active set to false value, midPoint will display that account as disabled. No extra logic or mapping is needed to achieve that. The capability does it all.

The situation is slightly more complex in our LDAP server in ExAmPLE company. ExAmPLE is using OpenLDAP server, which does not have any reasonable method to disable users. Therefore, we have to be very creative. Perhaps the least bad method is to extend OpenLDAP schema with custom auxiliary object class MidPointPerson which defines new attribute midPointAccountStatus. The midPointAccountStatus takes values enabled and disabled to represent account status. No value of midPointAccountStatus attribute means enabled account as well. As there are two options for representing enabled status, we have to specify that in the definition of simulated capability:

    <capabilities>
        <configured>
            <cap:activation>
                <cap:status>
                    <cap:attribute>ri:midPointAccountStatus</cap:attribute>
                    <cap:enableValue></cap:enableValue>
                    <cap:enableValue>enabled</cap:enableValue>
                    <cap:disableValue>disabled</cap:disableValue>
                </cap:status>
            </cap:activation>
        </configured>
    </capabilities>

This is not an ideal solution as this MidPointPerson object class is quite cumbersome. However, when combined with some access control list (ACL) magic, it can work quite well.

Synchronization Example: LDAP Account Correlation

Previous example demonstrated the use of synchronization for HR data feed. That is the most obvious use of synchronization mechanisms. However, midPoint synchronization can do more tricks than just feeding data to midPoint. Synchronization is frequently used even for target resources. In that case the synchronization is usually used for several purposes:

  • Initial migration: This is a process of connecting new resource to midPoint. There are usually accounts that already exist in the resource at the time when a resource is connected to midPoint. It is likely that at least some accounts correspond to the users that are present in midPoint (e.g. users created from the HR feed). Therefore, the accounts from the resource need to be correlated to the users that already exist in midPoint. Synchronization is the right mechanism for this.

  • Detection of illegal (orphaned) accounts: Security policies are usually set up in such a way that only those people that need an account on a particular resource should have that account. This is known as the principle of least privilege. However, in typical identity management deployment, there is nothing that would prohibit system administrator to create any accounts at will. This freedom is often even desirable, because there are emergency situations where full control over the system is crucial. Yet, even for emergency cases, we want to make sure that the situation is aligned with policies when the emergency is over. MidPoint can easily do that, by scanning the target systems in regular intervals. Synchronization mechanisms can be used to detect accounts that do not have any legal basis (orphaned accounts) and delete or disable such accounts. Again, synchronization mechanism can do that easily.

  • Attribute value synchronization: Accounts in target resources are usually created as a result of midPoint provisioning action. However, account attribute values are in fact copies of the data in midPoint. Attribute values can easily be changed by users or system administrator, may be set to old values during data recovery procedure, or they can get out of sync by a variety of other means. MidPoint can make sure that the attributes are synchronized and that they stay synchronized for a long time. Synchronization mechanisms are ideal for this purpose.

Older identity management systems used synchronization mostly to get data from the source resources to identity management system. Synchronization in midPoint is much more powerful than that. It can be applied both to source systems and target systems, it can pull data, push data, detect inconsistencies and fix them. Synchronization is a general purpose mechanism, it is a real work-horse of identity management with midPoint. This is the principle of reuse again. Synchronization mechanism can be reused for variety of purposes.

In this example we will be using synchronization to connect existing LDAP server to midPoint. We assume that our midPoint is already connected to the HR system and we have imported the HR data. Now we have midPoint users created for all our employees. Then there is this LDAP server. It is really important LDAP server. This server is used by company enterprise portal and also by a variety of smaller web applications. Those applications are using the LDAP server for user authentication and access authorization. The LDAP server was deployed many years ago. Initially, it was populated by the HR data. However, the LDAP server was managed manually by a system administrator during all these years. Therefore, it is expected that there are some accounts that belong to former employees. Also, it might have happened that some accounts are missing. It is quite likely that a lot of the accounts have wrong data.

First task is to set up the connector for this resource. As LDAP servers are used for identity management purpose all the time, MidPoint comes with a really good LDAP connector. All we need is to set up the resource to use that connector:

resource-ldap.xml
<resource oid="8a83b1a4-be18-11e6-ae84-7301fdab1d7c">
    <name>LDAP</name>

    <connectorRef type="ConnectorType">
        <filter>
            <q:text>connectorType = "com.evolveum.polygon.connector.ldap.LdapConnector"</q:text>
        </filter>
    </connectorRef>
    ...

What we can see here is a slightly more sophisticated method to reference the connector. So far we have seen only a direct connector reference by OID. This works well for almost all the references in midPoint, because OID never changes. However, connectors are somehow elusive objects. Objects that represent connectors are dynamically created by midPoint when a connector is discovered. Therefore, the OID is generated at random when midPoint discovers new connector. There is no practical way for a system administrator to predict that OID. Yet, we still want our resource definitions to refer to a particular connector when we import the definition. Therefore, there is an alternative way to specify object references. This method is using a search filter instead of direct OID reference. When this resource definition is imported to midPoint, midPoint uses that filter and looks for LDAP connector. If that connector is found, then the OID of that connector is placed in the reference (connectorRef). Therefore, the next time midPoint is using this resource, it can follow the OID directly. This is a very convenient method. However, there are few limitations. Firstly, the filter is resolved only during import of the resource definition object. Which means that it is resolved only once. If the connector is not present at import time, then the reference needs to be corrected manually. Secondly, this approach works if there is only one LDAP connector deployed to midPoint. This is usually the case. However, the connector framework can contain several connectors of the same type in different versions. This is a very useful feature for gradual connector upgrades, testing of new connector versions and so on. Yet, in case that the filter matches more than one object, the import will fail. In that case the connector reference has to be set up manually. However, this method of connector references is very useful in practice, and it is used all the time.

Once we have proper reference to LDAP connector we need to configure the connection:

resource-ldap.xml
    ...
    <connectorConfiguration>
        <icfc:configurationProperties>
            <cc:port>389</cc:port>
            <cc:host>localhost</cc:host>
            <cc:baseContext>dc=example,dc=com</cc:baseContext>
            <cc:bindDn>cn=idm,ou=Administrators,dc=example,dc=com</cc:bindDn>
            <cc:bindPassword><t:clearValue>secret</t:clearValue></cc:bindPassword>
            ...
        </icfc:configurationProperties>
    </connectorConfiguration>
    ...

This is all very similar to the configuration of the other resource that were already presented in this book. It should be quite self-explanatory.

The XML example above, as all other examples in this book, is simplified and shortened for clarity. You will not be able to import the example in this form into midPoint. For a full importable examples see the files that are supposed to accompany this book. Please see Additional Information chapter.

The basic resource configuration above is sufficient to connect to the resource. Therefore, the test connection operation on resource details page should be successful. However, LDAP servers support many object classes and midPoint does not yet know which object class represents account. Therefore, we need to add schema handling section to our resource:

resource-ldap.xml
    ...
     <schemaHandling>
        <objectType>
            <kind>account</kind>
            <displayName>Normal Account</displayName>
            <default>true</default>
            <delineation>
                <objectClass>inetOrgPerson</objectClass>
                <auxiliaryObjectClass>midPointPerson</auxiliaryObjectClass>
            </delineation>
            ...

This is pretty much the standard definition as we have already seen. However, there is one new aspect: midPointPerson auxiliary object class. This is an additional LDAP object class which is going to be applied to all object that midPoint eventually creates. It is not important for synchronization at this stage, yet we are going to set it up now, so we have a complete working configuration.

We continue with setting up mappings for the attributes:

resource-ldap.xml
            ...
            <attribute>
                <ref>dn</ref>
                <displayName>Distinguished Name</displayName>
                <limitations>
                    <minOccurs>0</minOccurs>
                    <maxOccurs>1</maxOccurs>
                </limitations>
                <outbound>
                    <source>
                        <path>$focus/name</path>
                    </source>
                    <expression>
                        <script>
                            <code>
                                basic.composeDnWithSuffix('uid', name,
                                        'ou=people,dc=example,dc=com')
                            </code>
                        </script>
                    </expression>
                </outbound>
            </attribute>
            ...

There should be outbound mapping for each mandatory LDAP attribute for the inetOrgPerson object class. Such mappings are very typical for a target resource definition.

Once we set up the schema handling, we should be able to conveniently list LDAP accounts in midPoint. However, we need to click on the btn:[Reload] button. The accounts are stored in the LDAP server and midPoint can access them. However, midPoint have not processed the accounts yet. Therefore, there are no account shadows in midPoint repository yet. Click on the btn:[Reload] button initiate reading of accounts from the LDAP server.

We are going to import (or reconcile) the resource accounts. However, if we try to do this now, nothing would really happen. The accounts are not linked to users, therefore midPoint would not synchronize the attributes. MidPoint was not told to do anything with the accounts, therefore midPoint does nothing. That is one of midPoint principles: midPoint does not change the accounts in any way unless it is explicitly told to do so. We would rather do nothing than to destroy the data.

Before we can import the accounts, we need to set up the synchronization configuration for this resource. There are accounts in the LDAP server that should belong to users that already exist in midPoint. We want to link them. However, we do not want to do the linking manually. We would rather set up a correlation mechanism that does this automatically. We would like to use LDAP attribute employeeNumber to correlate accounts.

resource-ldap.xml
            ...
            <attribute>
                <ref>employeeNumber</ref>
                <displayName>Employee Number</displayName>
                <correlator/>
                <outbound>
                    ...
                </outbound>
                <inbound>
                    <target>
                        <path>$focus/personalNumber</path>
                    </target>
                    <evaluationPhases>
                        <include>beforeCorrelation</include>
                        <exclude>clockwork</exclude>
                    </evaluationPhases>
                </inbound>
            </attribute>
            ...

The employeeNumber attribute is configured as correlator by using the correlator configuration element. The employeeNumber attribute is mapped to personalNumber user property, therefore midPoint knows that values of employeeNumber attribute are to be compared with values of personalNumber. When midPoint encounters an LDAP account, it takes value of employeeNumber attribute, transforms it using the inbound mapping defined above, and looks for matching value of personalNumber among all the users.

Inbound mapping and evaluation phases
Inbound mapping is necessary at this point, even though the LDAP resource is a target resource. The inbound mapping is used to transform the value for correlation purposes. Its responsibility is to transform the value of employeeNumber account attribute to the same format as is used by personalNumber user property. In this case the format is the same, therefore no transformation is used (asIs expression is assumed). The evaluationPhases configuration limits application of this mapping to correlation only. It excludes clockwork phase, which is a main synchronization phase. Therefore, the mapping will be used for correlation, but it will not be used for other parts of synchronization process.

If correlation values match, then we assume that the account should be linked to the user. In that case midPoint decides that synchronization situation is unlinked (they should be linked, but they are not yet linked). We want midPoint to link the account in this case, therefore we define appropriate reaction:

resource-ldap.xml
                ...
                <reaction>
                    <situation>unlinked</situation>
                    <actions>
                        <link/>
                    </actions>
                </reaction>
                ...

Unliked accounts get linked. This takes care of accounts for whose we can automatically find an owner by using correlation mechanism. What we should do with other accounts? We will do nothing about them just yet. Therefore, we do not need to define any other reactions. This may be somehow surprising. We do not want illegal accounts in our LDAP server, do we? Then perhaps we would like to see a reaction to delete unmatched accounts, right? That would be a good approach, but it is just too early for this. We do not want to delete unmatched account just now. There may be accounts that are perfectly legal, just the employeeNumber attribute is missing or mistyped. Data errors like those happen all the time, especially when the data were managed manually. We do not want to over-react and start deleting accounts too early. Therefore, we go just with this one synchronization reaction for now.

Now it is the right time to start import or reconciliation task for LDAP resource. After the task is finished the situation may look like this:

OpenLDAP accounts
For the curious readers, the LDAP server has data equivalent to the content of ldap-real.ldif file located in book samples.

It looks like we have quite a good data in the LDAP server. Most of the accounts were successfully correlated and linked to their owners. Yet, there are few accounts that were not correlated. Those accounts ended up in unmatched situation. You can resolve this situation by manually linking the unmatched accounts to their users. Simply click on the small triangle button next to the unmatched entry and select menu:Change owner[] from the context menu. Then select the right user (Isabella Irvine) in the dialog that appears. After that the account is linked to the user. Repeat this process to link all unmatched accounts.

There is one interesting thing in the screenshot above. Have a look at the LDAP account identified by uid=carol. While most other accounts have their uid value taken from the surname of the user, this account is an exception. Even though the uid value is obviously wrong, midPoint have linked the account to the correct user (Carol Cooper). The reason is that we have set up midPoint to use employeeNumber for correlation. Even accounts whose usernames violate the convention can be automatically linked to their owners - as long as there is any reliable piece of information that can be used for correlation.

Search panel on top of account list can be used to make manual linking faster. The btn:[Situation] search control can be used to select accounts that are in Unmatched situation.

Looking for unmatched accounts

We can quickly handle the obvious unmatched accounts, and link them manually. When all the accounts are linked to their owners, we end up with two accounts that do not have owners: john and oscar. In identity management parlance, these are called orphaned accounts. While having a closer look at the accounts, these two accounts obviously belong to former employee. John and Oscar do not work for ExAmPLE company for several years. While John’s account is disabled at the very least, Oscar’s account is still active. What a deprovisioning disaster this is! We have illegal account here, still active, still accessible by an employee that might not be entirely happy to be laid off. Quite obviously, we have to do something about it.

This is the right time to complete the synchronization policy. Once the correlation is complete and all accounts have an owner, detecting an unmatched account means that an illegal account is created in LDAP server. Now we can tell midPoint to delete any unmatched accounts.

resource-ldap.xml
                ...
                <reaction>
                    <situation>unmatched</situation>
                    <actions>
                        <deleteResourceObject/>
                    </actions>
                </reaction>
                ...

When reconciliation task is completed, all orphaned accounts are gone. We have reduced our security risks, which is one of the primary reasons for having identity management system in the first place.

Disable orphaned accounts

Clever reader is not entirely persuaded at this point. Outright deletion of orphaned account might look nice in security policy documents, yet it looks just too aggressive for practical use. Clever reader is, as always, quite right. We really want to react to orphaned accounts as soon as possible to reduce any security risks, which means we want to react automatically. However, it is much better to just disable the account instead of deleting it. Some orphaned accounts may have benign reasons. The account might have been created by LDAP administrator, because the HR system is temporarily broken and business must go on. Detection of orphaned account may also be a false positive, caused by miscofiguration of midPoint correlation mechanism, or simple caused by wrong HR data. Re-enabling the account is much less work than re-creating the account, especially if there were already some data stored in the account. Even if orphaned account is detected correctly, we still may want to keep it for a while. It may provide good material (and evidence) for investigation of possible security incident.

Fortunately, it is easy to reconfigure midPoint to disable the account instead of deleting it. All that is needed is to change the action from deleteResourceObject to inactivateResourceObject.

We have our accounts partially cleaned up now. All accounts are linked to owners and orphaned accounts are gone. However, there may be some accounts in the LDAP server that have wrong attribute values. By "wrong" we mean that the attributes have different values than the values that are computed by the outbound mappings. However, midPoint is not correcting those values just yet. Remember the midPoint principle that it does not change the account unless we have explicitly told to do so? Those accounts are in the linked situation, and we have not configured any reaction for this situation. Therefore, midPoint did nothing. Now we need to tell midPoint to synchronize the values:

resource-ldap.xml
                ...
                <reaction>
                    <situation>linked</situation>
                    <actions>
                        <synchronize/>
                    </actions>
                </reaction>
                ...

A clever readers is now surely wondering whether we have forgotten something. We have, indeed. Attribute values are synchronized by running reconciliation process. However, our outbound mappings will not work in reconciliation. They do not have any explicit definition of strength, therefore midPoint assumes normal strength. Those mappings are supposed to implement the last change wins strategy. Therefore, reconciliation cannot overwrite the account data, as midPoint does not know whether it was account attribute or user property that was the last to change. If midPoint is not sure about something, then it assumes a conservative position and does nothing. We do not want to destroy the data. Therefore, what we need to do now is to let midPoint know that we really mean it, that the mappings are really strong:

resource-ldap.xml
            ...
            <attribute>
                <ref>cn</ref>
                <displayName>Common Name</displayName>
                <limitations>
                    <minOccurs>0</minOccurs>
                    <maxOccurs>1</maxOccurs>
                </limitations>
                <outbound>
                    <strength>strong</strength>
                    <source>
                        <path>$focus/fullName</path>
                    </source>
                </outbound>
            </attribute>
            ...

Clever reader is uneasy once again. What is this limitations thing here? Simply speaking, the limitations specify that the attribute is optional (minOccurs=0) and that it is single-valued (maxOccurs=1). However, isn’t midPoint supposed to be completely schema-aware and figure that all by itself? Yes, it is, and midPoint does it all right. In fact, that is the reason why we need to override the information from the schema using this limitations element here. The cn attribute is specified in LDAP schema as a mandatory attribute. However, we have just specified outbound mapping for that attribute. Even if no value for attribute cn is entered in midPoint user interface, we can still determine that value by using the expression. Therefore, even though LDAP schema specifies attribute cn as mandatory, we want to present that attribute as optional in midPoint user interface. Hence the minOccurs limitation. The maxOccurs limitation is immediately obvious to anyone who is intimately familiar with LDAP peculiarities. In the LDAP world, almost everything is multi-valued by default. Even commonly used attributes for account identifiers and names are multi-valued. Nobody is really using them as multi-valued attributes, because vast majority of applications will probably explode if they ever encounter two values in the cn attribute. Yet, those attributes are formally defined as multi-valued in LDAP schema, and that is what midPoint gets from LDAP connector. The maxOccurs limitation is overriding the schema, and forcing midPoint to handle this attribute as if it was single-value attribute.

The last thing that we need at this point is to handle activation. We need a very simple mapping for that:

resource-ldap.xml
            ...
            <activation>
                <administrativeStatus>
                    <outbound/>
                </administrativeStatus>
            </activation>
            ...

The mapping is entirely empty (<outbound/>), as it completely relies on default settings. Source is default ($focus/activation/effectiveStatus), target is default (activation/administrativeStatus of account), also the expression is default (asIs). All midPoint needs to know is that we want such mapping at all.

However, there is one more detail to set up. As LDAP does not have any standard reasonable account disable mechanism, we have to tell midPoint which method are we using for this particular case. We do that using configured capability, as we have already seen before:

    <capabilities>
        <configured>
            <cap:activation>
                <cap:status>
                    <cap:attribute>ri:midPointAccountStatus</cap:attribute>
                    <cap:enableValue></cap:enableValue>
                    <cap:enableValue>enabled</cap:enableValue>
                    <cap:disableValue>disabled</cap:disableValue>
                </cap:status>
            </cap:activation>
        </configured>
    </capabilities>

From this point on we can keep reconciliation task running periodically by setting task schedule. Our synchronization policies are in good shape, ready to be continually enforced. Periodic reconciliation runs keeps account attributes updated. Even more importantly, it keeps us protected from serious security risk posed by orphaned accounts.

That is all. Now you can schedule reconciliation tasks to keep an eye on the LDAP server. The task corrects any attribute values that step out of line and deletes any illegal accounts. This is how synchronization tasks can be useful, even in case of pure target resources.

However, there is one last word of warning. Those accounts were synchronized and linked to existing midPoint users. The accounts were not created by midPoint. Therefore, there is nothing in midPoint that would say that those accounts should exist. In midPoint parlance, there are no assignments for those accounts. MidPoint makes clear distinction between policy and reality. Therefore, midPoint is aware that those accounts exist, but there is no policy statement that would justify their existence. By default, midPoint does nothing, and it lets the accounts live. The accounts are created or deleted only if there is an explicit change in the assignments. There is no such change now, therefore the accounts are not deleted. However, this is quite a fragile situation. Accounts that are linked but not assigned can easily get deleted if midPoint administrator is not careful. Of course, there are methods to handle such situations. One way would be to create the assignments together with the links. Those that are interested in this method should look up keyword "legalize" in midPoint docs. However, there are much better methods to handle such situations. Perhaps the best approach would be to utilize the roles (RBAC). This is the topic of the Role-Based Access Control chapter later. Yet, there are still more things to learn about synchronization until we get there.

Peculiarities of Reconciliation

Reconciliation is a process of comparing current state of an account (reality) to a desired state of the account (policy). Reconciliation does not only compare the accounts, it is actively correcting the inconsistencies. Reconciliation can correct wrong data on resources (outbound direction). Yet, it also works the other way. It can correct the data in midPoint (inbound direction). Therefore, reconciliation is one of the most useful tools in the identity management toolbox.

Reconciliation can be used in a variety of ways. Reconciliation can be initiated for one specific user by using midPoint user interface. In that case, midPoint compares the values of all user’s accounts to the values that were computed using the mappings. If there are differences, midPoint corrects account values. This approach is perfect for testing reconciliation setting using just a single user. This feature is also useful for fixing values of one specific user.

Reconciliation of a specific user may be useful, but it is an ad-hoc approach. We usually favor systemic approaches in identity management. Therefore, reconciliation is usually used in a form of a reconciliation task. Reconciliation task lists all the accounts on the resource, and then it reconciles each account, one by one. This is a way to keep all resource accounts continuously synchronized.

There is a couple of things regarding reconciliation that can be somehow surprising. Firstly, reconciliation of an account may cause modification of a user. This happens if there are inbound mappings for that account. This is perhaps quite expected. However, the operation does not stop there. If a user is changed, then such change may propagate to other accounts on other resources, usually by the means of outbound mappings. MidPoint does not like procrastination, therefore it will try to apply all the changes immediately. It means that reconciliation of one account may cause changes to other accounts. Which makes a lot of sense, yet it may be quite surprising. Secondly, reconciliation skips all normal-strength mappings. We have already explained the reasons for that, but this is something that can surprise even an experienced midPoint engineer from time to time. If we are sure that we want the mapped value to be present in the account all the time, then strong mappings are the way to go.

A curious reader that has already explored midPoint user interface has surely noticed recompute function. To the untrained eye, recompute looks almost exactly the same as reconciliation. However, there are subtle differences. Firstly, recompute does not force fetching of account data. When recomputing, fresh account attributes are not retrieved from the resource, unless midPoint inevitably needs them for the computation. This usually happens if weak or strong mappings are used, in that case the attribute values are retrieved. However, if there are normal mappings only, then recompute may skip retrieval of fresh account data. MidPoint compares and corrects account attribute values only for those accounts that are retrieved from resource during this process. That is how recompute works. Correcting account data is more or less just a side effect of recompute. The purpose of recompute is to correct data of midPoint users, which means evaluation of object templates and other internal policies.

On the other hand, reconciliation always tries to read all the accounts regardless whether they are needed for computation or not. Therefore, all the attributes on all the accounts are corrected. That is the purpose of reconciliation: correct the account data.

There is yet another difference between recompute and reconcile tasks. The purpose of recompute task is to correct user data. Therefore, recompute task iterates over midPoint users. Recompute task does not detect new accounts on the resource, and it may even overlook if an account is deleted. However, reconciliation task is different. In fact, reconciliation task has several stages. Main reconciliation stage lists all resource accounts. It determines owner of each account, compares the attributes and corrects them. As this process iterates over real accounts on a resource, it can easily detect new accounts. When the main stage is completed, the next phase looks at account shadows stored in midPoint. The task looks for shadows that have not been processed in the main phase. Those are accounts that used to be on the resource some time ago, but have disappeared since. That is how reconciliation detects deleted accounts.

Rule of the thumb for the reconciliation versus recompute dilemma is this:

  • Use recompute if you want to update users. For example, use recompute after change of object templates, policy rules or role definitions to apply the changes to all users. Recompute is usually initiated manually, on as needed basis.

  • Use reconciliation to keep accounts synchronized. For example, run periodic reconciliation task to make sure accounts in your LDAP servers are up-to-date. Reconciliation is usually scheduled at regular intervals, to continually maintain data consistency.

Usernames

Usernames take many forms. Some people like usernames based on first names (john), others prefer last names (smith), many practical solutions end up using a combination of both (jsmith). Some organizations prefer immutable identifiers, such as employee numbers (1458363), which is not entirely favored by users. We will discuss that later, in Focus Processing chapter. Whatever username convention we are going to choose, we have a problem in our current ExAmPLE midPoint deployment: midPoint has usernames based on employee numbers (001), while LDAP has usernames based on names (anderson). We should use one or the other, not both of them at the same time.

Clean, simple and systematic solution would be to force usernames based on employee numbers from midPoint to LDAP. This is very simple to do, all we need is to change strength of dn mapping from weak to strong. Next reconciliation run would change all the usernames in LDAP to match values of user name property in midPoint. However, this may not be entirely desirable. This process changes all usernames in the LDAP directory, changing their usernames (uid), but also distinguished names (dn) in the directory. There are usually other applications connected to the directory, as that is the reasons for directories to exist. Application accounts are almost always bound to either uid or dn in LDAP directory. Strictly speaking, there is immutable entryUUID attribute in OpenLDAP directory, which is almost ideal for account binding. Of course, almost no application is using entryUUID, they all rely on uid or dn. If uid and dn are changed, account bindings in applications may be lost. Users may lose their settings, preferences, user profiles, access to data and home directories - and they are not going to like that. Not to mention the simple fact, that they are not going to like the new numeric usernames in the first place.

It may be a much better idea to keep existing LDAP usernames, and change usernames in midPoint to match them. Fortunately, this is also very easy to do. All we need is one inbound mapping:

resource-ldap.xml
            ...
            <attribute>
                <ref>uid</ref>
                ...
                <inbound>
                    <strength>strong</strength>
                    <target>
                        <path>name</path>
                    </target>
                </inbound>
                ...
            </attribute>
            ...

Once reconciliation run is completed, LDAP usernames are applied to midPoint users. Users have user-friendly (although slightly non-consistent) usernames now.

Users with nice usernames

It is a good idea to remove the inbound mapping after midPoint usernames are corrected. We do not need it any more, and it can be quite confusing or even harmful in the long run. We probably want to generate usernames in a systematic and consistent way from now on, which will be described in Focus Processing chapter.

It is also a good idea to disable inbound mapping from empno attribute to user’s name in the HR resource. If we would leave this mapping active, it can ruin our usernames by replacing them with employee numbers from HR system. Even worse, as we have configured outbound mappings to LDAP directory, change of user’s name is reflected to LDAP uid attribute, effectively renaming all LDAP accounts. We definitely do not want that! Disabling the inbound HR mapping leaves us in a situation in which there is no mapping for user’s name at all. This means creation of new midPoint user from new HR record is going to fail. We have to live with that now, as we are not going to risk destruction of all our usernames. We will fix that later, when we learn how to generate proper usernames.

Deltas

Reconciliation is really useful mechanism. It is reliable and thorough. However, it is also quite slow, and it consumes o lot of computational and network resources. There are good reasons why reconciliation is such a heavyweight beast. Reconciliation works with absolute state of accounts. It means that reconciliation is reading all the accounts with all the values of all relevant attributes. Then it recomputes everything. Even the attributes and values that were not changed are recomputed. This is a very reliable way of computation, and it is also the method used by the majority of traditional identity management systems.

Yet, there is also a better way. If we know that just one attribute was changed, we can recompute that single attribute only. We do not need to care about other attributes. Moreover, if we know that attribute foo has changed in such a way, that there is a new value bar, then it gets even better. We just need to recompute the value bar, and we need not care about any other values. This is what we like to call a relative change. We care just about the values that were changed. That is how midPoint works internally. We could say that MidPoint is relativistic system.

This is where delta comes in. Delta is a data structure that describes the change of a single midPoint object. Add delta describes a new midPoint object that is about to be created. Modify delta describes existing midPoint object where some properties have changed. Delete delta describes an object that is going to be deleted.

This is a very powerful mechanism. Just remember that everything in midPoint can be represented as an object: user, account, resource, role, security policy …​ everything. Therefore, delta can represent any change in midPoint. It may be a change of user password, deletion of an account, change of connector configuration or introduction of a new policy rule. If all the changes can be represented in a uniform way, then they can also be handled in a uniform way. Therefore, it is easy for midPoint to record all the changes in an audit trail – including configuration changes. It is easy to route any change through an approval process. And so on. MidPoint can create a relatively simple mechanisms to handle changes, and then those mechanisms can be applied to any change of (almost) any object.

Let’s have a closer look at an anatomy of a delta. There are three types of delta: add, modify and delete. Add delta is quite simple. It contains a new object to be created.

Add delta

Delete delta is even simpler. It contains just object identifier (OID) of an object to be deleted.

Delete delta

Last one is modify delta. This delta contains a description of modified properties of an existing object. As the object can change in a variety of ways, modify delta is the most complex of the three. Modify delta contains a list of item deltas. Each item delta describes how a particular part of an object changes. For example, following delta describes that a new value pirate is added to a user property employeeType.

Modify delta: add

The item delta may have three modification types: add, delete and replace. Add modification means that new value or values are added to an item. Delete modification means that value or values are removed from an item.

Modify delta: delete

In both add and delete cases, the values that are not mentioned in the delta are not affected. They remain unchanged as they are. However, replace modification is different. This means that all existing values of the item are going to be discarded, and they are replaced with the value or values from the delta.

Modify delta: replace

The deltas are designed to work with both single-valued and multi-valued items. In fact, add modification and delete modification deltas are specifically designed with multi-value items in mind. Those deltas can work efficiently even in cases that there is a multi-valued attribute that has a very large number of values. There is a good reason for this. Multi-valued properties are quite common in the identity management field. Just think about how roles, groups, privileges and access control lists are usually implemented. Everyone who ever managed a large group in LDAP server certainly remembers that experience in vivid colors. Fortunately, midPoint is designed to handle situations like these.

Everything in midPoint is designed to work with deltas: user interface, mappings, authorizations, auditing …​ all the way down to the low-level data storage components. Mappings work in a relativistic ways. That is one of the reasons why we need to explicitly specify sources of the mapping. Mapping source definitions are matched with items in the delta to control execution of the mapping. Deltas permeate entire midPoint computation. Deltas are input to the mappings, and mapping produce other deltas as output. Therefore, we can have a complete chain: deltas that are result of inbound mappings is applied to the user object, but those deltas are also input to outbound mappings. Everything is relativistic in midPoint.

This might seem to be a bit over-complicated at the beginning. But do not worry. You will get used to it. Clearly, this approach has major advantages.

However, a clever reader does not seem to be impressed. How can this relativistic approach conserve any significant portion of computational resources? We usually fetch the entire account from the resource anyway. Therefore, there is no harm to recompute all the attributes. The computation itself is fast, it is the fetch operation that is slow. Isn’t it? The clever reader is, of course, right - or rather partially right at the very least. Most resources indeed fetch all the account attributes in a single efficient operation. For those cases, there is no big increase in efficiency if we go with the relativistic methods. However, there are exceptions. For example, some resources does not return all the values of big attributes, e.g. all the members of a large group. Additional requests are needed to fetch all the values – and there may be a lot of requests if the group is really large. Relativistic approach has significant benefit in those cases. The benefits will be even more obvious when we get to the live synchronization in the next section. Yet, performance is not the primary motivation for the relativistic approach. There is one extremely strong reason to be relativistic: data consistency. Consistency is something that brings ugly nightmares to many engineers who try to design distributed system. Identity management solution is a distributed system, managing data in many independent applications and databases. It is also a very loosely-coupled distributed system. There is no support for locking or transactions in the connectors. Even if there was some support, the vast majority of resource cannot provide those consistency mechanisms on their identity management APIs. This means that midPoint cannot rely on traditional data consistency mechanisms. MidPoint cannot make sure that the data are not changed during computation or between computations. That is why relativistic approach is so useful. Relativistic computation has a very high probability of achieving correct result even without locking or transactions. This is more than acceptable for typical identity management deployments. For those rare cases where relativistic computation can fluctuate, there is always reconciliation as a last resort. Yet, thanks to the relativistic nature of midPoint, the need for reconciliation is significantly reduced.

That was a lot of long words, but clever reader seems to be satisfied now. At least for a while. For the readers that are still scratching their heads, there is quite a simple summary: relativistic approach of midPoint can do miracles. For example, midPoint resource can be both sources and targets, even a single attribute can be both source and target of information. It is the relativistic approach that enables configurations like this. The principle of relativity is relatively simple. Yet, its effect in midPoint is nothing short of being revolutionary.

Live Synchronization

MidPoint has a range of synchronization mechanisms. Slow, brutal but reliable reconciliation is at one end. Live synchronization is on the other. Live synchronization is a lightweight mechanism that can provide almost-real-time synchronization capabilities. Live synchronization is looking for recent changes on a resource. When such changes are detected, live synchronization mechanisms process those changes immediately. The synchronization delay is usually in order of seconds or minutes, provided live synchronization is used properly. Unlike reconciliation, live synchronization is not triggered manually. That would make very little sense. Live synchronization works in a long-running task, repeatedly looking for fresh changes in short time intervals.

If a resource is already configured for synchronization, then all that is needed to run live synchronization is to set up a live synchronization task. MidPoint user interface can be used to do that easily. An example of live synchronization task was provided in the HR Feed section above. Live synchronization task wakes up at regular intervals. Each time the task waves up, it invokes the connector. Connectors capable of live synchronization have special operation that is used to get fresh changes from the resource.

The connector can support any reasonable change detection mechanism – in theory. Yet, two mechanisms are commonly used in practice:

  • Timestamp-based synchronization: Resource keeps track of last modification timestamp for each account. The connector looks for all accounts that have been modified since last scan. This is very simple and relatively efficient method. However, it has one major limitation: it cannot detect deleted accounts. If an account is deleted, then there is no timestamp for that account, and therefore the connector will not find it in the live synchronization scan.

  • Changelog-based synchronization: Resource keeps a "log" of recent changes. The connector is looking at the log, and it is processing all the changes that were added to the log since the last scan. This is a very efficient and flexible method. Yet, it is not simple. Not many systems support it, and there are often hidden complexities.

Those mechanisms are called live synchronization strategies, and they are further explained in following section.

All live synchronization methods need to keep the track of what changes are "recent", i.e. which changes were already processed by midPoint and which were not processed yet. There is usually some value that needs to be remembered by midPoint: timestamp of last scan, last sequence number in the change log, serial number of last processed change and so on. Each connector has a different value with a connector-specific meaning. MidPoint refers to those values as "tokens". The most recent token is stored in the live synchronization task. That is how midPoint keeps track of processed changes. There are (hopefully quite rare) cases when resource and midPoint token get out of alignment. This may happen in cases such as the resource database is restored from a backup, if network time gets out of synchronization and so on. If that happen, then deleting the token from the live synchronization task is usually all it takes to get the synchronization running again.

Live synchronization is fast and very efficient. However, it is not entirely reliable. MidPoint may miss some changes. This is quite a rare situation, but it may happen. Reconciliation can remedy the situation in such a case. Just remember, all the synchronization mechanism share the same configuration. It is perfectly acceptable to run live synchronization and reconciliation on the same resource at the same time. Of course, it would be a good idea to run reconciliation less frequently than live synchronization.

Live Synchronization Strategies

Live synchronization is fast and efficient - in theory. However, as usual, the devil is in the details. There is no one single synchronization protocol or standard that would work for all the resources. Every system has its own way to synchronize data. Some systems (such as LDAP servers) even have several mechanisms to choose from. Then there are source systems that have no practical way to implement efficient synchronization. We would refer to such methods as live synchronization strategies.

Synchronization strategy is configured on connector level, and the details should be, theoretically, hidden inside the connector. MidPoint would not know and would not need to know what synchronization strategy is used. That might work in an ideal world. Yet, we live in a practical world, and there are many details that leak through the connector interface.

Let us use LDAP as an example. LDAP is, theoretically, a standard. However, the standard does not specify any synchronization mechanism. There is experimental RFC 4533, which is not widely adopted. Yet, synchronization capabilities are necessary, and every major LDAP server provides some synchronization mechanism. Some mechanisms are quite good, some are not. There is an ancient "Retro change log" mechanism, going back to Netscape/iPlanet LDAP servers originating in the 1990s. That mechanism is still used today, in several variants. Active Directory, in a very typical way, has its own "DirSync" synchronization mechanism. OpenLDAP has yet another mechanism based on the access log. There is this RFC 4533 standard, which is used so rarely, that there was no request to implement it in midPoint LDAP connector. Then there is a catch-all synchronization mechanism that looks for recent changes based on modifyTimestamp attribute.

In theory, all the synchronization strategies above should be equivalent - but they are not. For example, some variants of "Retro change log" synchronization cannot reliably detect rename operations. There may be problems with delete operations as well, especially if coupled with rename operations. Almost every mechanism has its quirks. Then there is the modifyTimestamp, which is the most problematic of all.

Unfortunately, it is quite common practice to use a synchronization strategy based on last modification timestamp. Not just for LDAP, but also for database tables and other types of source systems. This is perhaps understandable, as this is a very simple mechanism. However, it has a lot of problems. The obvious problems can be caused by de-synchronized time on network, although in the age of Network Time Protocol (NTP) this should not be a problem at all. The other problem is a timestamp granularity. If the timestamp is granular to one second, that can be a big problem. One second is a very long time for a computer. A lot can happen in one second. Therefore, the connector has to include the "boundary" second to both consecutive synchronization runs, which means that the records may be processed twice. Going for millisecond granularity makes the problem less severe, but the problem is still there.

However, the worst problem is that this strategy cannot detect deleted objects. Deleted objects are not there anymore, they do not have last modification timestamp, therefore they will not be included the search. This means that there must be a reconciliation process running together with live synchronization. Wait a minute, it is usually recommended running reconciliation anyway, as a form of "safety net", isn’t it? It is, but the difference is in the timing. It is one thing to run reconciliation once a week to make sure that no records were missed. Yet, it is a completely different thing to run reconciliation every hour to make sure deleted objects are properly handled. This makes a huge difference, especially for deployments with millions of entries. Strategies based on last modification timestamp may look like a good idea at the beginning. However, they usually turn into a major liability in the long run. Avoid them if you can.

The bottom line is, that synchronization strategies are not created equal. In fact, the individual strategies tend to have vastly different characteristics. Our advice is to learn how each synchronization strategy works, what are the limitations and when it fails. Also, avoid the use of strategies based on last modification timestamp if there is any other viable alternative.

Conclusion

Synchronization is one of the most important mechanisms in the entire identity management field. Primary purpose of synchronization is to get the data into midPoint. That is good approach when an identity management deployment begins: feed your midPoint with data first. Get the data from the HR system. Correlate the data with Active Directory. Connect all the major resources to midPoint and correlate the data again. MidPoint does not need to make any changes at this stage. In fact, it is perfectly good approach to make all the resource read-only at this stage. The point is to let midPoint see the data. Why do we need that?

  • We can see what is the real quality of the data. Most system owners have at least some idea what data sets are there. However, it is almost impossible to estimate data quality until the data are processed and verified. That is exactly what midPoint can do at this stage. This is essential information to plan data cleanup and sanitation.

  • We can learn how many accounts and account types are there. It is perhaps quite obvious that there are employee accounts. Are there also accounts for contractors, suppliers, support engineers? Are those accounts active? What is the naming convention? Do system administrators use employee accounts for administration, or are they using dedicated high-privilege accounts? This information is crucial to set up provisioning policies.

  • We can learn distribution of accounts and their entitlements. Do all employees have accounts in Active Directory? Are there any large user groups? How does organizational structure influence the accounts? This information is very useful to design a role-based access control structures and other policies.

  • We can detect some security vulnerabilities. Are there orphaned accounts that should have been deleted long time ago? Are there testing accounts that were left unattended after the last nighttime emergency? Indeed, there is no security without identity management.

This is a good start. Even if this is all that you do in the first step of the deployment, it is still a major benefit. You can get better visibility, and with that comes better security. You have the data to analyze your environment, and plan next step of the identity management deployment. You won’t be blind any longer. That is extremely important. It is indeed a capital mistake to theorize before one has data.

Was this page helpful?
YES NO
Thanks for your feedback