SCIM Troubles

Last modified 24 Nov 2024 22:00 +01:00

What Is SCIM?

System for Cross-domain Identity Management (SCIM) is an IETF specification (RFC7642) of a RESTful service for identity provisioning. SCIM specification describes services to create, read, update and delete (a.k.a. "CRUD") data about users and groups. SCIM services are provided by systems that store identity data, such as applications with their own database, cloud service providers and so on. SCIM services are invoked by software systems that need to manage the identities.

SCIM As Universal Interface

SCIM is built on a noble idea to define an universal interface for identity provisioning that every application can use. This sound like a great idea - and in fact it is. The problem is that implementing such an interface is much harder than it seems. This approach was tried many times in the past, including attempts of protocol specification that were very similar to SCIM. Such efforts are almost never successful. In fact such attempts usually introduce additional complexity to the system and result in maintenance nightmares in the long run. We see people trying the same approach over and over for almost two decades. The result is always the same.

This seems to be very counter-intuitive. Creating such abstraction is a well-know and usually very useful architectural pattern. How it is possible that such a best practice does not work for identity provisioning? The answer is far from being straightforward, therefore it is perhaps best to illustrate that using a couple of examples.

SCIM specifies a nice schema for the user (account). Every user should look like this in SCIM notation (simplified):

{
  "userName":"jdoe",
  "name":{
    "formatted": "John Doe, PhD.",
    "familyName": "Doe",
    "givenName": "John",
    "honorificSuffix": "PhD."
  },
  "emails":[
    {
      "value":"john.doe@example.com",
      "type":"work",
      "primary": true
    }
  ]
}

This looks very nice and sane. But all the systems are not the same. Let’s have a look at seemingly simple concept of user’s name. Some systems have the name neatly separated to its components such as first name and last name. Other systems are much simpler and just put everything in one messy string. SCIM allows variations in the representation of name. Therefore system A returns this:

{
  "userName":"jdoe",
  "name":{
    "familyName": "Doe",
    "givenName": "John",
    "honorificSuffix": "PhD."
  },
  ...
}

And system B returns this:

{
  "userName":"jdoe",
  "name":{
    "formatted": "John Doe, PhD."
  },
  ...
}

So far so good. However, the trouble comes when we want to create an account on both system A and system B. SCIM has to way to indicate that system A needs familyName and givenName while system B needs formatted name.

It is perhaps no big deal for system B. SCIM service endpoint may be smart enough to accept both forms of the name as it is easy to format full name from its components. But what should poor system A do if it gets formatted name such as Hrabě Felix Teleke z Tölökö. Which part is first name, which is last name, where is the middle name and which parts are honorific titles? The client that sends just the formatted name is fully SCIM-compliant, as SCIM schema allows this. The service A is also fully SCIM-compliant. Yet they cannot interoperate. The correct behavior needs to be configured or hardcoded in the client. Which makes interoperability quite difficult to achieve.

You may think that this is not really a problem. And in fact you may be right, given the common IDM practices of today’s world. People are used to the fact that integration with every new system is a slow and painful process. But SCIM is not supposed to work just for the needs of today. SCIM was supposed to be an improvement. But it is not yet there.

This particular problem is perhaps not so bad when it comes to user’s name. Getting the name right is going to require interaction with the user anyway. But the same problem permeates entire SCIM schema.

Is password required to create new user? How should we encode/hash the password? What about user activation status (is user enabled or disabled)? Does the service support activation at all? Which values are valid? Just true/false? Is there any way to indicate an archived account? What about activation dates (valid from, valid to)? Those are the necessary minimum for even a simple identity management solution. But SCIM does not have the answers.

All of that boils down to interoperability - or rather lack of it. SCIM client needs to know a lot of out-of-band details about the service for that specific deployment to work. The consequence is that vast majority of practical SCIM clients are developed to work with a particular service and they work only with that service. They are not portable to other services. They are not interoperable. This is not really what one would expect from a standard protocol, is it?

Standard Schema For Identity Management

One of the problems of SCIM is that it tries to enforce a standardized identity schema. But standardized identity schema is a good thing, isn’t it? How can this be a problem?

Common identity schema is a good thing, in principle. The problem is that different communities and deployments have different ideas what is common. For example LDAP community cannot agree how to disable an account - and LDAP exists for almost three decades. There are 3-4 official grouping mechanisms in LDAP, plus pretty much every LDAP server has its own because the standard ones have their problems. Agreeing on common schema is really hard. It is even harder to standardize it. Which was obviously very clear to SCIM authors as the SCIM standard schema is very vague.

Then there is a pile of pre-existing software systems. Some of them are LDAP based with inetOrgPerson and eduPerson schemas. Those are not strictly compatible with SCIM schema as there is no reasonable way how to express mutivalue LDAP cn in SCIM name. Oh yes, using cn with multiple values is a very rare sight. But this means you cannot make a bridge between LDAP and SCIM that will be fully compliant to both standards. You have to cut corners. And then there are systems that have their own schemas that are not really SCIM-compatible either. They will have to cut even more corners to expose SCIM-compliant interface. Each such compromise means that there is a loss in fidelity. It means that there will be things that cannot be done with SCIM - or that you have to bend SCIM standard or resort to dirty hacks. One schema does not rule them all.

Similar problems were present in Identity Connector Framework (ICF) that was created by Sun Microsystems back in 2000s. ICF was not even trying to go as far as to define how a name or an address of a person should be presented. What ICF mandated is that each account must have two identifiers:

__UID__, which is unique and it should be immutable if possible. You can imagine that as LDAP entryUUID or a value of database autoincrement column. It may be generated by the system where the account is created.
__NAME__, which is provided by the user when account is created. This is usually a username or login name.

That may sound like a minimal and very reasonable requirement. Except that it is not. It is neither minimal nor reasonable. There are systems that do not have __UID__ at all. There are systems that do not have __NAME__ at all. There are systems where __NAME__ does not need to be unique. There are systems where a combination of several identifiers is needed to uniquely identify an account. There are systems that use both __UID__ and __NAME__, but __UID__ is required as identifier to modify account. There are systems that use both __UID__ and __NAME__, but __NAME__ is required as identifier to modify account. There are so many options. And we are talking just about simple identifiers. The is the most basic stuff of identity management. Everything else is going to be harder.

Identity Connector Framework is dead for almost a decade. But there is ConnId project that follow up on that effort. We have made many improvements over the years in ConnId. But even that small piece of hardcoded identifier schema haunted us all the time. Hardcoded schema for identity management interface does not work.

But wait a minute! There is a hardcoded identity schema in midPoint! And pretty much any identity management platform has such schema. How is it possible that identity management systems work?

Simply speaking, identity management systems work because they are systems. They are not mere specifications written down on a piece of paper. There is code. A huge pile of code. There is man-decades worth of pure development work in midPoint, plus additional effort for testing, documentation, communication and management overhead and all the other things around it. MidPoint knows how to map data between incompatible schemas. It was designed to do that precisely that. MidPoint can dynamically discover how the schemas look like. MidPoint can wire them together. MidPoint has tools to quickly change the mappings when the schemas evolve. MidPoint can simulate missing or non-standard functionality. MidPoint is flexible enough to adapt to standard violations and do all the dirty hacks. But there is a price to pay here. You probably do not want to spend man-decades to develop your SCIM-based integration solution.

To cut the long story short: what works for identity management systems is not the same thing that works for identity management interfaces.

If fact, we should probably be happy about the current state of SCIM and the hype and all. Lots of incompatible SCIM endpoints mean that there will be strong need for identity management systems. We can sell midPoint subscriptions by truckloads. But we just cannot be happy about something that is so wrong from engineering perspective.

Of course, SCIM can be improved. It seems that SCIM can be transformed to a good identity management interface eventually. Future SCIM versions may provide a means for a service to expose all the information that the client needs. But that is where all the ideas of universal identity provisioning interface get really complex. There is plethora of combinations of service capabilities, password formats, activation options and entitlement schemes. Fully-compliant SCIM clients will need to support them all, they will need to dynamically discover which are the right options and adapt the functionality. Which will effectively turn SCIM clients into small identity provisioning systems.

Issues, Issues And More Issues

SCIM is in its second version now and there are talks about a third revision. There were two SPML versions before SCIM and a handful of provisioning protocols before that. Those attempts go back for almost two decades. SCIM has a lot of previous failures to learn from. Therefore it is quite a big surprising how many issues still remain deeply embedded in SCIM.

SCIM has a prefabricated concepts of user and group. It is almost unbelievable that group membership is controlled by members attribute of a group. This is a well-known approach that goes back (at least) to 1990s. This approach is so well known especially because it is almost always very problematic. Majority of deployments have groups that contain pretty much every user in the organization. Which means that now we have a Group SCIM object that has many values in its members attribute. Groups with thousands to millions users are not that rare any more. Imagine how the SCIM client lists groups with that many members, how long the SCIM response is going to be. There is a workaround to request all group attributes except members, which is something that a reasonable client always wants to do. However, members attribute still needs to be used for group modification. Which means that both the client and service has to be implemented very carefully to avoid performance issues. It would be all so much easier if groups attribute of the user was used instead. Or even better: if group mechanics was just a special case of some well defined entitlement or role management mechanism. Which leads us to entitlements and roles attributes of the user, which are mentioned, but not really defined. SCIM leaves a lot to be desired here.

There are lot of smaller issues that make it hard to use SCIM for serious business. There is no good way how to indicate that user has a password without revealing information about the password (e.g. its hashed value). However, this functionality is often needed, e.g. if we want to set a password for a user but only if the user does not have a password yet. Username is mandatory and it has to be globally unique within a service. However, the username may be generated by the service to ensure the uniqueness. But in that case the username may not be present in the create operation, which is somehow in conflict with the fact that username is required. Global uniqueness of username may also be a problem for multi-tenant systems. Such systems have to use workarounds, such as introducing internal structure to the username. And some systems may not need username at all. SCIM forces such systems to duplicate the id into username. Which is also seems to be an anti-pattern.

There are more issues, but it perhaps makes no sense to enumerate them all. The big picture should be quite clear now. Those issues may be caused by the way how many protocols are developed nowadays. Many protocols are developed during standardization process, not before the process. Therefore there is not enough time and opportunity to validate the protocol by using it in diverse real-world scenarios. SCIM obviously suffers from this premature standardization problem.

Future Of SCIM

There was SPML once. It is dead now. Then there was SPML2. That one is dead too. (And no, XML was not the primary reason for SPML failure.) SCIM has a really tough act to follow here.

SCIM is undoubtedly an improvement over SPML. SCIM is better. But that is not the question. The question is whether SCIM is good enough. And in the state that SCIM is now, it is not good enough.

SCIM may be a good start. But it is just a start. It needs major improvements. It has to be cleaner, richer and more dynamic. But there is a significant price to pay to get such things. There will be new complexity. A lot of complexity. Therefore forget about simple universal SCIM clients. The clients will be either simple or universal. But not both.

Pragmatic Look At SCIM

Despite all that was said so far, SCIM can still be useful. It just needs to be used reasonably, one needs to be aware of the limitations and set the expectations right. Our recommendations:

If you are just starting, it may be a good idea to start with SCIM. It is better to start with SCIM than to reinvent everything - especially if you are new to identity management. Identity management is much more complex that it seems. Chances are that you end up with something much worse than SCIM if you try to do it your way.
Do not expect that SCIM will solve all your problems. Do not expect that your service will be accessible by any SCIM client. It won’t. You will need special client that can be based on SCIM. But you need to develop that yourself. Do not expect that your client can access any arbitrary SCIM service. It won’t. You have to adapt your client for every new service. In fact, expect that practical interoperability is going to be really low. However, it may still be better to use SCIM instead of building a service or client on a green field.
Do not use SCIM groups if you can avoid it. The way how SCIM deals group membership is a well-known anti-pattern and it is bound to cause a lot of problems sooner or later. Create your own entitlement mechanism instead.
It may be a good idea to avoid using the pre-fabricated User type as well. The fixed schema of User may not suit your purposes. There is no point for you to translate your LDAP eduPerson schema to SCIM User when your clients are going to translate it back to LDAP anyway. It may be much better to create your own EduUser resource from scratch.
It is probably not worth the effort to migrate your existing identity provisioning interface to SCIM. Unless your identity provisioning interface is really primitive, you are going to struggle to make SCIM do what you need it to do. You will need to create a lot of custom SCIM extensions. You will need to change the behavior. You will most likely end up violating SCIM specifications anyway. The benefit of migration is that people that it will be easier to understand your API for people that have seen a SCIM interface before. But they will need to understand your custom extensions anyway and they will almost certainly need to write custom client code. You have to decide for yourself if such benefit is worth for your specific case.

As long as you are aware of all the limitations of SCIM and it still satisfies your needs it is perhaps OK to use SCIM. SCIM limitations are not the primary problem with SCIM. Every technology has limitations and SCIM can be a good fit for many simple solutions. The real problem is that there are massively inflated expectations about SCIM. Lot of engineers with a limited experience in identity management see SCIM as a silver bullet. But it is not. It is just an ordinary technology in its early stages of development.

SCIM In MidPoint

We do not use SCIM in midPoint, not directly anyway. There are many reasons for this.

MidPoint is older than SCIM. MidPoint already had rich API when SCIM was just being developed. Our API is much richer that SCIM, it is build for dynamic environment and it has more features. Adopting SCIM as our API would be a significant downgrade.

Clever reader will notice that midPoint has a user schema that is very similar to SCIM schema. The reason is that midPoint schema and SCIM schema are based on the same specifications such as VCARD and FOAF. However, even though the schemas are similar, they are not the same. SCIM and midPoint schemas are not directly compatible.

Of course, we can create a SCIM interface in addition to our regular interface. But in that case we will need to maintain and support two interfaces instead of one. Which is not a negligible effort. In addition to that, it is very likely that SCIM will go through the usual hype cycle. Which means that people will start using the limited SCIM interface instead of our full-featured API. Then we will get a lot of request to extend SCIM functionality to support all midPoint features. We will have to make hacks and workarounds to expose such functionality using SCIM. Which means that we will spend a lot of effort to get to the same place where we already are.

MidPoint supports SCIM indirectly. There is a couple of SCIM-based connectors for some services. And we expect that we will develop more such connectors in the future. However, we have no truly universal SCIM connector, and it is very unlikely that such a connector will ever be possible or practical. Just look at LDAP. LDAP exists since 1993. It is one of the most established and stable protocols that we have in the entire IAM field. Yet, our "universal" LDAP connector has to account for many peculiarities of every individual LDAP server. And we have a separate connector for Active Directory, even though the connector is still using LDAP for communication. The same approach was adopted for current "universal" SCIM connector. The connector work very well with services that are strictly following SCIM specification. Also, it supports custom SCIM schema extension and relies on midPoint mappings to improve the compatibility. Nevertheless in practice, many SCIM supporting services requires additional non-standardized extension to work properly. Therefore, this connector contains exceptions and additional functionality for such services, and it’s expected that number of such exception will grow in the future.

There are efforts to create a SCIM API for midPoint as a contribution to midPoint project. We hope that this code may be used by the people who like to experiment with SCIM for integration. This may become native part of midPoint one day - if it will prove its usefulness in practice. But after the failures of DSML, SPML1, SPML2, lukewarm start of SCIM1 and its problems that were not really addressed in SCIM2, it is perhaps not difficult to understand that we are quite sceptical about identity management standards designed by committees.

This Is All Wrong!

This Is All Wrong! SCIM is a standard! And you should behave and support the standards. Come on! Implement SCIM service in midPoint. Now!

Well, technically, SCIM is an informational RFC, not a standard. But even if it was a standard, what is a values of a standard if it does not really work? We believe that the primary reason for having standards is interoperability. And SCIM is not doing incredibly well on that front.

However, we admit that we may be wrong with our assessment of SCIM. In that case please contact us and let us know what exactly we have got wrong. We will fix it. We may even reconsider our approach to support SCIM in the future. But there are two conditions:

SCIM has to mature. There are many improvements that needs to be done in SCIM for it to become useful.
There needs to be an incentive. Funding needs to be secured for both development and maintenance of SCIM interface. Or there needs to be significant demand from midPoint subscribers. Hype is not a significant motivation just by itself.

Let the community decide. If you like the ideas of SCIM and the solutions that SCIM provides than go ahead and use it. We will be more than happy to admit that we were wrong about SCIM if that is really the case. If you find it useful to use SCIM with midPoint then let us know. Just please, do all of us a favor: try using SCIM before you talk about it. Make sure that your evaluation of SCIM is based on real-world experience and that it is not just driven by hype and inflated expectations. What we need is a robust engineering solution, not a television show. Everything works perfectly in slide shows and talks. But we are not going to deploy and run those, are we?

Was this page helpful?

YES NO

Thanks for your feedback