Shadow Caching

Last modified 19 Feb 2025 11:43 +01:00

Attribute caching feature

This page is an introduction to Attribute caching midPoint feature. Please see the feature page for more details.

MidPoint usually works with fresh data. When midPoint needs to get data about an account then the data will be retrieved on-demand from the resource. This is usually the best method. But there are cases when this approach is problematic. There may be resources that are often down. There may be resources that are very slow. Yet others may be costly or limited regarding accessing them, typically when in cloud. Therefore, midPoint has an ability to cache the values of resource objects and use them instead of retrieving them from the resource.

Moreover, there is another benefit of having the data cached: instant reporting. With the relevant parts of data being cached right in the repository, one can write reports against them, usually allowing for more complex queries and providing faster execution times.

How It Works

The only supported caching method is passive caching: Caches are maintained with minimal impact on normal operations. Generally the data are cached only if they are retrieved for other reasons. There is no read-ahead. The writes are always going to the resource (synchronously): read-through, write-through. There is no cache eviction (but old information is overwritten if newer information is available).

If caching is turned on, which is by default for new midPoint installations, then midPoint will cache the data in Shadow Objects. The caches are build gradually as midPoint reads the objects. If you need to populate the caches with information you have to make an operation that is searching all the objects. E.g. reconciliation or similar operation should do the trick. See Refreshing the Cache below.

Configuring Caching

The caching can be enabled or disabled at the level of the whole system, a resource, an object class, an object type, or even an individual item (attribute, association, activation, and so on).

An Example

For a quick overview, let us consider the following example. (The complete description is below.)

Listing 1: An example of custom caching configuration

<resource>
    ...
    <schemaHandling>
        <objectType>
            <kind>account</kind>
            <intent>default</intent>
            ...
            <attribute>
                <ref>ri:jpegPhoto</ref>
                <cached>false</cached> (4)
            </attribute>
            ...
            <caching>
                <cachingStrategy>passive</cachingStrategy> (1)
                <scope>
                    <attributes>all</attributes> (2)
                    <associations>none</associations> (3)
                </scope>
                <timeToLive>P1D</timeToLive>
                <defaultCacheUse>useCachedOrFresh</defaultCacheUse>
            </caching>
        </objectType>
    </schemaHandling>
    ...
    <caching>
        <cachingStrategy>none</cachingStrategy> (5)
    </caching>
    ...
</resource>

1	Enables the caching for `account/default` object type.
2	Enables the caching of all attributes.
3	Disables the caching of associations.
4	Overrides the caching for a specific attribute.
5	Disables the caching for all other objects.

Shortly speaking, the caching is disabled for the resource as a whole, except for the account/default type, for which it is enabled: Cached are all attributes except for jpegPhoto, activation, but no associations.

Configuration Details

At the level of resource, object class, or object type definition, the caching configuration item can be provided. It has the following sub-items:

Table 1. The `caching` configuration item content
Item	Description	Default	Default for caching-only read capability
`cachingStrategy`	The overall switch that turns the caching on and off. It can have a value of `none` (no caching) and `passive` (passive caching as described above).	`passive`
`scope`	Scope of the caching (see below).	see below
`timeToLive`	For how long we consider the cached data as fresh.	P1D	unlimited
`defaultCacheUse`	How is the cache used for midPoint operations.	`fresh`

Table 2. The `scope` configuration item content
Item	Description	Default	Default for caching-only read capability
`attributes`	Scope of the caching for simple attributes (i.e., not reference ones)	`defined`	`all`
`associations`	Scope of the caching for associations (i.e., reference attributes).	`all`
`activation`	Scope of the caching for the activation information.	`all`
`auxiliaryObjectClasses`	Scope of the caching for the auxiliary object classes.	`all`
`credentials/password`	Scope of the caching for the password.	`all`	`none`

Table 3. The values of the `scope` configuration items for simple attributes
Value	Description
`none`	No items of given kind will be cached.
`defined`	Only attributes defined (refined) in the `schemaHandling` section will be cached.
`mapped`	Only mapped items of given kind, i.e., those that have any mapping defined right in the object type, either inbound or outbound, will be cached.
`all`	All items of given kind will be cached.

Table 4. The values of the `scope` configuration items for other items (associations, activation, and so on)
Value	Description
`none`	The data are not cached.
`all`	The data are cached.

Exceptions (both positive and negative) to the scope can be defined by using cached boolean property present for individual attributes.

System-wide defaults for the caching can be set in the system configuration. This is the default setting for all new installation, taken from the initial object holding the system configuration:

Listing 2: Setting the default values for the shadow caching

<systemConfiguration>
    ...
    <internals>
        <shadowCaching>
            <defaultPolicy>
                <cachingStrategy>passive</cachingStrategy> (1)
            </defaultPolicy>
        </shadowCaching>
        ...

1	Enables the shadow caching for all resources as a default. The defaults for individual values, e.g., time to live, can be overridden here as well.

When changing the defaults in the system configuration, you need to apply changed values to all resource. The easiest way how to do that is to restart midPoint (all nodes in the cluster). An alternative is to make (any) change to each resource. For example, you can let midPoint to reload the schema, or put the resource into maintenance mode and back, and so on. We plan to resolve this issue in the near future.

Configuring Cache Use

Even with caching turned on, the data may or may not be used for regular midPoint operation. For example, import from a resource may run against the actual resource data, or against the cached shadows in the repository. In a similar way, user recomputation may also use the actual or cached data. When someone opens a user projection in GUI, the cached or actual data can be displayed. And so on.

Use of Cached Data for Data Processing

This section describes the use of cached data for data processing in midPoint.

This primarily means providing source data for inbound mappings. But it also means providing the idea of what data are on resources when outbound mappings are concerned, e.g., for the correct application of weak and strong mappings.

What it does not cover, though, is the primary input of synchronization tasks, like import and reconciliation. It is taken always from the resource. (Unless turned off by an experimental setting described in Use of Cached Data for Import and Reconciliation Tasks section below.)

So, the data processing is driven by the defaultCacheUse configuration property. It can have the following values:

Value Description

Value	Description
`useFresh`	When we need the data from the shadow, we will fetch it from the resource. This is the same behavior as it was before midPoint 4.9.
`useCachedOrFresh`	When we need the data from the shadow, we will use the cached data, if they are available. If they are not, we will fetch them from the resource.
`useCachedOrIgnore`	When we need the data from the shadow, we will use the cached data, if they are available. If they are not, we will skip the respective part of the processing (e.g., a mapping). Experimental.
`useCachedOrFail`	When we need the data from the shadow, we will use the cached data, if they are available. If they are not, we will fail. Experimental.

useFresh

When we need the data from the shadow, we will fetch it from the resource. This is the same behavior as it was before midPoint 4.9.

useCachedOrFresh

When we need the data from the shadow, we will use the cached data, if they are available. If they are not, we will fetch them from the resource.

useCachedOrIgnore

When we need the data from the shadow, we will use the cached data, if they are available. If they are not, we will skip the respective part of the processing (e.g., a mapping).

Experimental.

useCachedOrFail

When we need the data from the shadow, we will use the cached data, if they are available. If they are not, we will fail.

Experimental.

For given task, the default behavior can be overridden using cachedShadowsUse model execution option (if that task does support specifying these options). However, this setting is currently experimental.

Use of Cached Data for Import and Reconciliation Tasks

The import and reconciliation tasks fetch their primary input, i.e., resource objects, right from the resource they are running against. It has a benefit of consistently update the cache for these objects.

There may be situations, though, when you’d need to avoid contacting the resource, and take data from the cache. It can be attained by specifying noFetch option, like this:

Listing 3: Sample reconciliation task running from the shadow cache

<task xmlns="http://midpoint.evolveum.com/xml/ns/public/common/common-3"
    oid="007c5ef2-3d1f-4688-a799-b735bbb9d934">
    <name>reconcile-hr-persons</name>
    <executionState>runnable</executionState>
    <activity>
        <work>
            <reconciliation>
                <resourceObjects>
                    <resourceRef oid="c37ff87e-42f1-46d2-8c6f-36c780cd1193"/>
                    <kind>account</kind>
                    <intent>person</intent>
                    <searchOptions>
                        <option>
                            <options>
                                <noFetch>true</noFetch> (1)
                            </options>
                        </option>
                    </searchOptions>
                </resourceObjects>
            </reconciliation>
        </work>
    </activity>
</task>

1	Specifies that the resource should not be contacted.

For the "remaining shadows" activity of reconciliation, where shadows that are presumably dead, are reconciled, we always go to the resource. If you need this behavior to be avoided, just turn off that activity.

This feature is experimental.

Caching Passwords

Passwords are special in some aspects.

They are security-sensitive, so their storage is governed by Password Storage Configuration.
Resources often do not provide password values.
1. Most resource simply do not provide anything about passwords.
2. Some resources provide an indication whether a password does exist or not.
3. Some resource provide the actual password value.
4. And there are resources that provide only a hashed value (e.g., like {SSHA}rxNYgQODi95h2bsjYXuBqvYz+I1gjgMkF9f0tA== for LDAP).

So, when a password is stored in the shadow object in the repository, there are the following possibilities:

no value - if the resource did not provide it,
no value, but the incomplete flag being true - if the resource indicated only the presence of the password,
encrypted value,
hashed value.

When having a resource that provides a hashed value of the password, the best approach is to set the connector to avoid providing the password at all (e.g., using passwordReadStrategy for LDAP connector). This way, midPoint will not get confused by mixing real and hashed values in stored shadows.

Password Storage

Password storage is driven by Password Storage Configuration in respective Security Policy object. The policy used is determined by merging the policy specified in the resource object type and the system-wide security policy.

Listing 4: Specifying the security policy for a resource object type

<objectType>
    ...
    <securityPolicyRef oid="069552e9-de28-41ff-8372-7e15c86dd516"/>
    ...
</objectType>

Obsolete Cached Values

If the resource does not provide password values, the situation can get tricky. Imagine the following scenario:

MidPoint requests the creation of an account (or password modification on existing account). The password value is cached in the repository shadow.
The password value is changed directly on the resource, e.g., by the user itself, or by an administrator.

There is no way how midPoint could get the information about the changed password, even when the shadow is re-read from the resource: the resource simply won’t tell. In order to avoid breaking existing functionality (e.g., prohibited values), midPoint will keep the last known value in the shadow.

Legacy Configuration

Partial caching of passwords was available in earlier versions of midPoint. It is still supported, although deprecated now. It was configured like this:

Listing 5: Legacy way of configuring password caching (DEPRECATED)

<objectType>
    ...
    <credentials>
        <password>
            ...
            <caching>
                <cachingStrategy>passive</cachingStrategy>
            </caching>
            ...
        </password>
    </credentials>
    ...
</objectType>

If this configuration is present (regardless of whether "modern" caching is enabled or not), there are the following limitations:

Storage is always in the hashed form, to avoid unintended password exposure (even in the encrypted form).
Passwords are not updated when objects are fetched from the resource.

This behavior is intentional, in order to avoid unintended storage of sensitive data when upgrading from earlier versions. It is recommended to migrate the configuration to the current format.

Refreshing the Cache

The best way of refreshing the cache is to run an import or reconciliation task. However, if you want to avoid the cost of data processing inherent in these tasks, you can use the following alternative:

Listing 6: A task that just refreshes the shadow cache

<task xmlns="http://midpoint.evolveum.com/xml/ns/public/common/common-3"
      oid="45012c3e-3ce5-46ed-8d27-8648d2cbbca0">
    <name>Reload objects on HR resource</name>
    <ownerRef oid="00000000-0000-0000-0000-000000000002" type="UserType"/>
    <executionState>runnable</executionState>
    <activity>
        <work>
            <import>
                <resourceObjects>
                    <resourceRef oid="42a11d38-afbb-4f0e-8aea-c848db8ba0ab"/> <!-- HR -->
                    <kind>account</kind>
                    <intent>default</intent>
                </resourceObjects>
            </import>
        </work>
        <execution>
            <mode>none</mode> (1)
        </execution>
    </activity>
</task>

1	Avoids data processing.

The same effect can be attained by clicking on Reload button on the resource page for accounts, entitlements, or generics.

Updating the Cache without Reading from Resource

For cached passwords, selected changes in the policies (e.g., switching the storage method from encryption to hashing) can be treated also using Shadow refresh task, without the need of reading from the resource.

The shadow refresh task executes the pending operations. It is an integral part of the refresh activity. Hence, some resource operations may occur as a side effect of updating the cached password via this task.

Impact on API Operations

The cached data are accessible by using the usual IDM Model Interface. There are two operation options that provide access to the cached data:

noFetch option: This option returns the data from midPoint repository. Therefore, if there are data cached in the repository then the noFetch option returns them.
staleness option: Requirement how stale or fresh the retrieved data should be. It specifies maximum age of the value in milliseconds. The default value is zero, which means that a fresh value must always be returned. This means that caches that do not guarantee fresh value cannot be used. If non-zero value is specified then such caches may be used. In case that Long.MAX_VALUE is specified then the caches are always used and fresh value is never retrieved.

Both options can be used to get cached data. The primary difference is that the noFetch option never goes to the resource, and it returns whatever data are in the repository. On the other hand, the staleness option is smarter, and it determines whether it has to go to the resource or not. In case that the "maximum" staleness option is used it will result in an error if cached data is not available.

Those options can be used both with getObject operations and search operations. For getObject the staleness option work as expected. But there is one special consideration for the search operations. The search operations cannot easily determine how fresh the data in the repository are. E.g. there may be new objects on the resource that are not in the repository. Therefore, to be on the safe side the search operations will always make search on the resource even if staleness option is specified. There is just one exception: the maximum staleness option will force repository search. However, if the search discovers any object that does not have cached data then it will result in an error (specified in the fetchResult object property).

Caching Metadata in Returned Shadows

Shadow Objects contain cachingMetadata property. This property can be used to determine whether the returned shadow represents fresh or cached data:

If no cachingMetadata property is present in the shadow then the data are fresh. They have been just retrieved from the resource.
If cachingMetadata property is present then the data are taken from the cache. The cachingMetadata property specified how fresh the data are (when they were originally retrieved).

Relation to the "Caching-Only" Read Capability

When the "caching only" read capability is present (e.g., for manual resources), the full shadow caching is enabled, with the following differences in default values (comparing to the standard caching):

Table 5. Default values differences for "caching-only" read capability
Item	Standard default value	Default when turned on by "caching-only" capability
`timeToLive`	P1D	unlimited
attributes caching scope	`defined`	`all`
password caching scope	`all`	`none`

The caching for "caching only" read capability can be turned off by specifying cachingStrategy to none.

The defaults above can be turned back to standard default values (the second column) by specifying cachingStrategy of passive.

Limitations

For Both Native And Generic (Oracle, SQL Server) Repository

Attributes whose values are provided by the resource should be either:
1. correctly marked as volatile;
2. not cached - by setting their cached property to false;
3. have values always provided by outbound mappings (if the resource provides only the default values).
For example, the uid attribute in LDAP can be derived from the dn attribute (if it’s in the form of uid=xxx,…). The solution is either to mark the attribute as volatile, turn its caching off, or to make sure its value is explicitly provided by midPoint each time the dn is created or modified. You can do that by providing an explicit mapping for, e.g., uid attribute along with the mapping for dn.
Localized attributes (see MID-5210) are not cached yet. See MID-10102.

For Generic Repository (Oracle, SQL Server) Only

values cannot be larger than 255 characters;
values cannot be empty (on Oracle);
illegal XML characters (like those with codes under 32) cannot be present in string values.

If you have attributes that do not conform to these restrictions, please either turn off their caching (using cached property with the value of false), or use the experimental storageStrategy of notIndexed.

"Use Cached" setting

Please turn on useCachedOrFresh setting with care. There may be unexpected differences to the pre-4.9 behavior, mainly related to inbound mappings being executed, because of the data being available from the cache. On the other hand, in some cases, inbound mappings for some attributes may be skipped, because the shadow may not be loaded - because of caching. That way or another, please make sure the system behaves as expected with useCachedOrFresh setting.

Caching of Passwords

Resources with readable passwords that return mangled (e.g., hashed) passwords can cause mixing of real and hashed values in stored shadows. This can break existing midPoint functionality. It is recommended either to configure the connector to provide existence or no information on the password, or, temporarily, it is possible to use the legacy (deprecated) style of the password caching.
When using a configuration in which all these conditions hold:
1. password not fully readable,
2. caching turned on with encryption storage method,
3. useCachedOrFresh cache mode,
4. password is changed both in midPoint and on the resource,
5. there are inbound mappings and/or strong outbound mappings,
  
  then the results of inbound mappings and/or application of strong outbound mappings would be unpredictable.
  
  The reason is that only the values coming from midPoint to the resource are cached correctly. The values changed on the resource are simply not known to midPoint. See Obsolete Cached Values section above.
In such cases, it is advised to turn off the password caching, eliminate the mappings, or make sure that the password is not changed both in midPoint and on the resource.
When applying changed policies using the Shadow refresh task, pending operations may get executed as a natural effect of running that task.

Performance Aspects

Shadow caching can have an impact on system performance, both positive and negative. Please see the related section in Performance Tuning document for more information.

Migration Note

Before 4.9, this feature was experimental. The default setting was that all attributes and no associations were cached.

Since 4.9, the defaults are more elaborate, as described in this document. Please take that into account when migrating.

Compliance

This feature is related to the following compliance frameworks:

ISO/IEC 27001 8.14: Redundancy of information processing facilities

Was this page helpful?

YES NO

Thanks for your feedback

Shadow Caching

How It Works

Configuring Caching

An Example

Configuration Details

Configuring Cache Use

Use of Cached Data for Data Processing

Use of Cached Data for Import and Reconciliation Tasks

Caching Passwords

Password Storage

Obsolete Cached Values

Legacy Configuration

Refreshing the Cache

Updating the Cache without Reading from Resource

Impact on API Operations

Caching Metadata in Returned Shadows

Relation to the "Caching-Only" Read Capability

Limitations

For Both Native And Generic (Oracle, SQL Server) Repository

For Generic Repository (Oracle, SQL Server) Only

"Use Cached" setting

Caching of Passwords

Performance Aspects

Migration Note

See Also

Compliance