Navigation Tree

Identifier Theory

Last modified 04 Oct 2021 15:31 +02:00

Identifier Definition

People may understand different things for the term identifier. Our definition is a technical one:

Identifier is an attribute or a combination of attributes (compound identifier).
Value(s) of identifier identify exactly one object (identity).

Identifier Characteristics

Some identifier characteristics:

As identifier has to identify exactly one object is has to be unique. If the values are not unique then they are not identifiers.
Identifier may point to an object that do not exist any longer. Or even an object that does not exist yet. That is OK.
Identifiers will be usually mandatory, i.e. every object will have an identifier. But some identifiers may be optional, i.e. only some objects will have identifier. Those are usually secondary (or "higher-degree") identifiers.
Identifier may be persistent (immutable, never changing) or it may be mutable. Changes in identifier are known as "renames". Of course, we prefer persistent identifiers. But reality does not always provides these.

Which means that an identifier that is non-unique is not an identifier. It may be a mandatory attribute to create an object. It may be a naming attribute. But it does not work as identifier.

Identifier Degree

Simple systems have just one identifier. But most practical systems have several identifiers of varying degree, such as primary identifiers, secondary identifiers and so on.

For example:

System	Primary identifier	Secondary identifier	Tertiary identifier	Note
Classic LDAP	DN
Modern LDAP	entryUUID	DN
Active Directory	GUID	samAccountName	DN	However, this may vary according to administration practices.
UNIX	UID (number)	username
Simple application	username

System

Primary identifier

Secondary identifier

Tertiary identifier

Note

Classic LDAP

Modern LDAP

entryUUID

Active Directory

GUID

samAccountName

However, this may vary according to administration practices.

UNIX

UID (number)

username

Simple application

username

There is no reason why we would like to have primary, secondary and tertiary identifiers and just stop there. Quaternary and penitentiary may be rare, but there is no reason to role them out. Therefore we define a concept of identifier degree to make this concept a generic one:

	Degree
Primary identifier	1
Secondary identifier	2
Tertiary identifier	3
Quaternary identifier	4
Penitentiary identifier	5
…	…

TODO: how those degrees are used in create, modify and search operations

Compound Identifiers

TODO: i.e. "multi-dimensional"

TODO: how compound identifiers differ from identifier degrees.

Impact on MidPoint 4.x

This means that the concept of primary/secondary identifier in midPoint 3.x and 4.0 is, strictly speaking, not correct. The problems:

Secondary identifier is used even though it may not be unique. E.g. in cases that ConnId NAME is not unique. This was an error, such an attribute should not be identifier at all. In our defense, we have been carried away by the "hardcoded" nature of ConnId attributes such as __NAME__. The NAME is required to create an object, whether the name is unique or not. This is something that needs to be fixed in ConnId (although it may not be easy due to compatibility). But it also needs to be fixed in midPoint. ConnId NAME should not be a secondary identifier in those resources.
AD deployment often set up several secondary identifiers (e.g. DN and samAccountName). This would suggest that there is a compound secondary identifier. But that is not the case. In fact there is secondary identifier and tertiary identifier.
Obviously, midPoint is missing the concept of generic identifier degree. Secondary identifier is obviously not enough.

TODO/Problems

Multi-value identifiers?
What to do with non-unique identifier values caused by data errors? If we strongly assume that identifiers are unique those cases can break our system.

Was this page helpful?

YES NO

Thanks for your feedback