IAM Myth: My Data Are Perfect

Last modified 18 Mar 2024 11:02 +01:00

My data are correct. The human resources (HR) database is in great shape. It is used for employment and payrolls, how can it be wrong? My organizational structure data are perfect. You can completely rely on my data.

The Shape Of My Data

Your data are not perfect. In fact, too many HR data sets are in quite bad shape. The data may be wrong at the time of entry, such as wrong or ambiguous transliteration of foreign names. However, the most problems are caused by data decay, data getting out of date in time. Typos are not corrected, maiden names are not properly changed, work positions and locations are not updated as they should. The older the data entry, the lower the probability that it is correct.

Quite surprisingly, this makes perfect sense. What is the motivation to keep HR data correct and up-to-date? Who cares that there is a typo in my name in the HR database? If my maiden name is not updated or my location is wrong, what is the real problem here? Everything will work fine and salaried will be paid. There is little motivation to keep HR data correct and up-to-date, therefore they tend to decay.

Garbage In, Garbage Out

However, this is a major problem for identity and access management (IAM) systems. IAM systems use HR data as one of primary sources of information. HR database is supposed to provide authoritative data on employees. Identity governance and administration (IGA) would like to use employee names to generate usernames and e-mail addresses. Typos and wrong transliterations are going to cause problems. Even worse, there is an overall business motivation for higher efficiency and deeper automation. IGA systems would like to use data about job positions and locations to automatically assign access rights. However, if that information is wrong, wrong access rights are assigned, which is likely to be a security issue. The information is also used to automatically unassign access rights. Valid access rights might be removed, which usually happens automatically, on a massive scale. This is going to be a big problem.

Probably the only relatively reliable information from HR database are the dates, such as start and end dates of employment. They are directly bound to payroll, therefore they are likely to be correct. Quality of other data items varies, which means that it is usually quite low.

Organizational structure data are often even worse. They are kept somehow correct at the level of top management. However, lower levels of organizational tree are not maintained very well. Why should they? The business will go on anyway. Even if the data are kept reasonably correct, they are often maintained in one big spreadsheet. They may not be available in a form suitable for machine processing. However, organizational structure is essential for IGA and automation. Organizational structure is one of the most chronic problems of IGA.

Pretty much everybody claims that their data are in great shape. Naive IGA deployments may rely on the data, and get into big problems. Identity management projects has a reputation of slow and expensive endeavors. Significant part of that reputation can be attributed to naive assumptions about input data quality.

What Should I Do?

You are not alone. Everybody overestimates quality of their data. The first step to solve this problem is to admit there is a problem. Data that are not automatically and continuously validated are always wrong. It is very rare situation for HR or orgstruct data to be validated - until IGA system is deployed. Therefore, you have to assume that the data are wrong.

Do not blindly rely on any input data. Always verify the data before fully committing to use them. The best practical way is to use IGA system for that. Advanced IGA platforms can pull the data, process them, compare and correlate them with reality - without changing any data in target systems. You will get idea about the quality of input data before your IGA deployment makes a huge mess beyond repair.

The trick is to use iterative approach. Back in early days of identity management in the 2000s, too many projects were trying the "big bang" approach, failing miserably. The technology have grown from that, changing the approach, proceeding in smaller steps.

Start with a small exploratory phase. Set up IGA platform specifically for the purpose of validating your plan. Pull in HR data, use IGA platform to transform them, and compare them with reality. Features such as correlation or data mapping capabilities are essential. Do not change any system yet. Just compare and evaluate the data.

You did your exploration, and you have discovered that the HR data are indeed wrong. What to do now? Well, the right way would be to notify HR department and ask them to fix the data. However, it is often a very slow and tedious process. Good IGA systems provide a method to temporarily override incorrect data, until the data source is fixed. Adapt your configuration, fine-tune the policies and work around incorrect data.

This is likely to take several iterations to get the right configuration. We at Evolveum have developed a methodology to guide you through the first steps of iterative IGA deployment. The methodology takes into account wrong input data and works around it.

When the IGA system is deployed, use it as a guardian of data quality. Synchronization capability can be used to keep data consistent across many systems, making data errors immediately obvious. Confronting the data with reality is the only practical and reliable method to maintain data quality.

There are more features that can dramatically improve the deployment process:

  • Simulation capability can predict the effects of new configuration or data on the system, before you commit to it. You can examine simulation results before any system is changed, and decide how to proceed.

  • Tresholds are a safety mechanism to avoid big problems. Automation is a great mechanism to solve a lot of problems quickly. However, it is also a mechanism to make a huge number of problems in a very short time. Thresholds can be set up in automation processes, stopping the process in case there is a suspicious number of unexpected changes, limiting the impact of input data errors.

Was this page helpful?
Thanks for your feedback