First Steps With MidPoint: Assessment

Last modified 13 Apr 2022 10:41 +02:00
Goal
Asses the real data quality, determine practical next steps. At this point we know what we really have, what we can build on, what needs to be improved. We can identify the most severe security risks, such as orphaned accounts. Now we can improve our plan, adding more details based on the real data.

You have some kind of HR data now. In theory, you should use the HR data to create and manage accounts in target system, such as your Active Directory. However, in practice, this is not entirely straightforward.

Firstly, it is almost certain that there are errors and inaccuracies in the HR data. The data were maintained manually for a long time, with no way for automatic validation. Mistakes in the data might be buried deep, surviving undetected for decades. Having nothing to compare the data with, there is no telling how good or bad the data are.

Secondly, the data in your target systems (especially Active Directory) certainly leave a lot to be desired. These were managed manually for years, with no automatic way to make sure they are correct. There will be account belonging to people that left your organizations years ago. There will be accounts using maiden names of women that are married now. There will be strange accounts and identifiers that originated ages ago when your organization was still small and system administration was fun. There may be all kinds of weirdness and historical baggage frozen in time because nobody remembers what it does and everybody is scared to touch it.

Taking HR data and simply forcing them to Active Directory will never work. We need much smarter approach.

This is what you have to do:

1. Connect HR System

Connect HR data source to midPoint. Set up your HR identity resource in midPoint, using CSV or DatabaseTable connector. Deal with just the very basic data items for now:

  • Names (given name, family name)

  • Employee number, student number or similar identifier

  • Status (active, former employee, alumni, etc.) and/or validity date/time (based on contract etc.)

Ignore other fields for now. We will get back to them later.

2. Import Users From HR To MidPoint

Import users to midPoint, using HR data. Select appropriate algorithm for midPoint username. You surely have some username convention (such as jsmith) in place. Then import the HR data, creating user objects in midPoint. As we are working with simple data for now, the import should go well.

User lifecycle

This is where user lifecycle management starts.

We need at least some basic framework for user lifecycle management at this point. (TODO) Do we import only active users from HR? All users? How we will mark active users? (Auxiliary) Archetype?

If we can identify inactive HR persons, we can utilize this information when checking for accounts in target systems that should not be there (if we do not import inactive users from HR, we will see their accounts in target systems as simply orphaned).

3. Connect Active Directory

Set up your Active Directory (or perhaps LDAP) identity resource in midPoint. Set up mappings for the small data set that you have (given name, username and so on). Set the mappings in comparison (TODO!) mode. We do not want to change any data yet.

4. Correlate Active Directory Accounts

Correlate Active Directory accounts with midPoint users. If you have employee numbers stored in your Active Directory, then use that for correlation. Otherwise, use the generated midPoint usernames (e.g. jsmith convention) as the correlation identifier to match (assumed) majority of the accounts to their corresponding owners in midPoint:

  1. Run the reconciliation task on AD resource (TODO! Something more user-friendly in the resource rather than creating a task?).

  2. Then have a look at the results in midPoint GUI (interactively).

If you maintained your identifier assignment conventions reasonably well, most identities should correlate well. MidPoint will show you correlation statistics for your accounts.

Of course, there will be problems of John Smith and Josh Smith with their jsmith and jsmith42 accounts. Let’s leave that for later. For now just focus on correlating the bulk of users.

If you get 80-90% users to correlate well, you are done here. Activate the correlation action (TODO: turning on UNLINKEDlink reaction) and correlate the majority of your accounts now.

Of course, you are doing this for the first time. Chances are that you have not got all your configuration exactly right at the first try. If you need to make configuration adjustments, just make them and re-run the reconciliation task. In case of deeper problem, it is still OK to scrap your AD resource and do it again (go back to step 3). Maybe you need to grab more data from HR feed (e.g. you may have forgotten to map employee number to midPoint). In that case you still can purge all identity data from midPoint, adjust HR configuration and import everything again (go back to step 2).

5. Clean Up The Accounts

After most of the accounts have been correlated, we can handle the rest. Run reconciliation with AD and then run reconciliation report to see if the accounts can be matched to their owners and simulation report to see if midPoint would attempt to update the accounts in the system. There may be one or more categories of uncorrelated accounts to take care of. Also, the handling of these accounts will differ.

You can clean up the data in several iterations.
  1. Orphaned accounts: Review the list of orphaned accounts (the accounts in Active Directory not having an owner in midPoint which should mean they are not related to HR data on which midPoint data is based) one by one and make sure these are not system accounts (see the next category). Mark the undesired ones as decommissioned to be deactivated eventually.

  2. System (service) accounts: For all accounts that are crucial for Active Directory, we need a different decision. Either let the AD administrators to remove the accounts if they are unnecessary or mark the system accounts as protected in midPoint to keep track of them, but ignore them otherwise by midPoint.

  3. Accounts unmatched because of data inconsistencies. Review the rest of accounts which have not been matched or decided in the previous steps. This is the time to take care of the Smiths, Johnsons and Browns. Have a look at all the jsmith, smithj and jsmith2 accounts. Try to figure out which account belongs to which user and correlate them manually. If you did the previous steps well, there should be just a handful of them.

Except the cases of uncorrelated accounts, other problems may be reported by the reconciliation/simulation reports and more work may be required to fix the situation.

  1. Clean up incorrect mappings to avoid undesired changes in AD accounts. Review the accounts where midPoint indicates a change of AD account attributes. This means there are differences between the account attribute values in the target system and the values computed by midPoint for these accounts in comparison (TODO!) mode. Review the mappings. Chances are that the mappings are working correctly, but data in Active Directory does not correspond to them because it was previously managed manually and can contain errors. (For example, some family names are capitalized and other are not.) Adapt the mappings in midPoint to not cause undesired changes in Active Directory.

  2. Mark some accounts for later review to avoid undesired changes in AD specifically for these accounts. Sometimes there are several accounts (or groups of accounts) which need to be reviewed in more detail and remedied. To avoid getting stuck in this phase, you may simply mark these accounts for later review and ignore any provisioning for them. (This is similar to the concepts of protected accounts, but the accounts should be marked only temporarily and will be reported.)

  3. Review some accounts marked for later review to avoid a constant increase of their numbers.

This phase may seem as phase. Why not just go directly to automation? That is what we really want! However, assessment is all but pointless. Automation can be done only after the assessment phase is done. Attempts to automate processes with unreliable data are futile, they invariably lead to failures, usually a very expensive failures. Speaking from a couple of decades of identity management experience, there is no such thing as reliable data, unless the data are cleaned up and systematically maintained with an assistance of identity management platform. Simply speaking: you may think that our data is good, but they are not.