Identifier Theory
Identifier Definition
People may understand different things for the term identifier. Our definition is a technical one:
-
Identifier is an attribute or a combination of attributes (compound identifier).
-
Value(s) of identifier identify exactly one object (identity).
Identifier Characteristics
Some identifier characteristics:
-
As identifier has to identify exactly one object is has to be unique. If the values are not unique then they are not identifiers.
-
Identifier may point to an object that do not exist any longer. Or even an object that does not exist yet. That is OK.
-
Identifiers will be usually mandatory, i.e. every object will have an identifier. But some identifiers may be optional, i.e. only some objects will have identifier. Those are usually secondary (or "higher-degree") identifiers.
-
Identifier may be persistent (immutable, never changing) or it may be mutable. Changes in identifier are known as "renames". Of course, we prefer persistent identifiers. But reality does not always provides these.
Which means that an identifier that is non-unique is not an identifier. It may be a mandatory attribute to create an object. It may be a naming attribute. But it does not work as identifier.
Identifier Degree
Simple systems have just one identifier. But most practical systems have several identifiers of varying degree, such as primary identifiers, secondary identifiers and so on.
For example:
System | Primary identifier | Secondary identifier | Tertiary identifier | Note |
---|---|---|---|---|
Classic LDAP |
DN |
|||
Modern LDAP |
entryUUID |
DN |
||
Active Directory |
GUID |
samAccountName |
DN |
However, this may vary according to administration practices. |
UNIX |
UID (number) |
username |
||
Simple application |
username |
There is no reason why we would like to have primary, secondary and tertiary identifiers and just stop there. Quaternary and penitentiary may be rare, but there is no reason to role them out. Therefore we define a concept of identifier degree to make this concept a generic one:
Degree | |
---|---|
Primary identifier |
1 |
Secondary identifier |
2 |
Tertiary identifier |
3 |
Quaternary identifier |
4 |
Penitentiary identifier |
5 |
… |
… |
TODO: how those degrees are used in create, modify and search operations
Compound Identifiers
TODO: i.e. "multi-dimensional"
TODO: how compound identifiers differ from identifier degrees.
Impact on MidPoint 4.x
This means that the concept of primary/secondary identifier in midPoint 3.x and 4.0 is, strictly speaking, not correct. The problems:
-
Secondary identifier is used even though it may not be unique. E.g. in cases that ConnId NAME is not unique. This was an error, such an attribute should not be identifier at all. In our defense, we have been carried away by the "hardcoded" nature of ConnId attributes such as
__NAME__
. The NAME is required to create an object, whether the name is unique or not. This is something that needs to be fixed in ConnId (although it may not be easy due to compatibility). But it also needs to be fixed in midPoint. ConnId NAME should not be a secondary identifier in those resources. -
AD deployment often set up several secondary identifiers (e.g. DN and samAccountName). This would suggest that there is a compound secondary identifier. But that is not the case. In fact there is secondary identifier and tertiary identifier.
-
Obviously, midPoint is missing the concept of generic identifier degree. Secondary identifier is obviously not enough.
TODO/Problems
-
Multi-value identifiers?
-
What to do with non-unique identifier values caused by data errors? If we strongly assume that identifiers are unique those cases can break our system.