Achieve High Availability of MidPoint with Clustering
High availability feature
This page describes configuration of High availability midPoint feature.
Please see the feature page for more details.
|
To achieve a high availability deployment, set up a cluster of several midPoint nodes working with a common midPoint repository. You can combine the multi-node deployment with a load balancer to evenly distribute the GUI and REST API traffic load across the nodes.
Architecture of MidPoint Cluster
There can be two or more nodes in a cluster. Although there is no fixed limit to the number of nodes in a cluster, practical limits always exist; it is up to you to determine the viable number of nodes experimentally based on your needs.
In the following schema, there are two midPoint nodes with a shared central repository and a load balancer.
The nodes in the cluster share the load as follows:
-
MidPoint tasks, such as live synchronization, reconciliation, import, or workflow approvals, can run on whichever node. The nodes take the available worker tasks on the first-come-first-served basis which results in a reasonably even load distribution across the cluster.
-
Users connecting to the graphical user interface (GUI) can work on any node.
Here is an outline of technologies used and recommended for the work distribution:
-
For task scheduling and distribution, midPoint uses Quartz scheduler. See Task Manager and Communication in Clusters Explained for architectural description and implementation details of the solution.
-
For load balancing the traffic to the GUI and REST API, we recommend using the standard Apache Tomcat solution.
-
If you wish to fail over GUI sessions without load balancing, you can use a network-level setup with a virtual IP.
Configure Your MidPoint Cluster
In order to deploy midPoint in a cluster, you need to adjust a couple of parameters in the midPoint configuration. This section covers the all the required settings you need to get you started.
Basic Setup
Here is a bare-bones configuration to start with when using the default PostgreSQL database:
<repository>
<repositoryServiceFactoryClass>com.evolveum.midpoint.repo.sql.SqlRepositoryFactory</repositoryServiceFactoryClass>
<database>postgresql</database>
<jdbcUsername>midpoint</jdbcUsername>
<jdbcPassword>.....</jdbcPassword>
<jdbcUrl>jdbc:postgresql://..../midpoint</jdbcUrl>
<hibernateHbm2ddl>none</hibernateHbm2ddl>
<missingSchemaAction>create</missingSchemaAction>
</repository>
<taskManager>
<clustered>true</clustered> (1)
</taskManager>
1 | The <clustered> configuration element containing true signifies the installation uses clustered mode.
The default is false . |
Clustering needs to be enabled on all nodes
All nodes sharing the same repository must have the |
In some circumstances, the Quartz component in task manager needs to use a separate database. If that’s the case, a proper configuration is needed.
When deploying clustered nodes, ensure your system time is synchronized across all nodes (using NTP or similar service). Otherwise, unexpected behaviour may occur, such as tasks restarting on different nodes.
Node Identification and Discovery Configuration
Even though the nodes in your cluster mainly talk to the shared database repository, at times, they need to communicate with each other as well. For this reason, you need to adjust their identification and discovery configuration.
You can use the repository configuration in the config.xml
file for your node configuration (see the Repository configuration column in the table below).
In this case, place the configuration keys directly into the <midpoint>
element.
Alternatively, you can specify the configuration via the command line interface (CLI) parameters in the form of -Dkey=value
.
See the CLI column in the table below.
Refer to Overriding config.xml parameters for details.
CLI | Repository configuration | Description |
---|---|---|
|
|
The node identifier.
The default is |
|
|
Source of the node identifier.
It is applied if
This configuration property is obsolete. You can still use it, though, if you manage nodes manually, need deterministic node IDs for testing, or you are migrating from an old midPoint version and need to preserve existing node IDs. |
|
|
Overrides the local host name information. If not specified, the operating system is used to determine the host name. Normally, you do not need to specify this information. |
|
|
Overrides the local HTTP port information. If not specified, Tomcat/Catalina is queried to determine the HTTP port information. This information is used only to construct URL address for intra-cluster communication. If you run a node behind a reverse proxy or NAT, for instance, you need to specify the port based on the network configuration. In such a case, you always need to specify the port number under which other nodes can see the particular node from their point of view. Normally, you do not need to specify this information. If you want to run midPoint under a custom port, use |
|
|
Overrides the intra-cluster URL pattern. Normally, you do not need to specify this information. |
How Intra-Cluster URLs Are Determined
In order to minimize the configuration work needed while keeping the maximum level of flexibility,
the node URLs used for intra-cluster communication (e.g., https://node1.acme.org:8080/midpoint
) are derived from the following items in the order listed here:
-
<urlOverride>
property in the node object in the repository. -
-Dmidpoint.url
/<url>
information (CLI parameter orconfig.xml
file). -
Computed based on the information in the
infrastructure/intraClusterHttpUrlPattern
property, if defined.
This property can use the following macros:-
$host
for host name: obtained dynamically from the OS or overridden by the-Dmidpoint.hostname
or<hostname>
config properties. -
$port
for HTTP port: obtained dynamically from Tomcat objects or overridden by-Dmidpoint.httpPort
or<httpPort>
config properties. -
$path
for midPoint URL path: obtained dynamically from the servlet container.
-
-
Computed based on the protocol scheme obtained dynamically from the Tomcat objects, host name, port, and servlet path as
scheme://host:port/path
.
When troubleshooting these mechanisms, you can set logging to DEBUG
for com.evolveum.midpoint.task.quartzimpl.cluster.NodeRegistrar
(or the whole task manager module).
Define URL pattern for inter-node communication
Nodes use the HTTP URL pattern to communicate between themselves. The pattern is a URL prefix pointing to the root URL of midPoint. Below is an example definition for the system configuration object:
<systemConfiguration>
...
<infrastructure>
<intraClusterHttpUrlPattern>https://$host/midpoint</intraClusterHttpUrlPattern>
</infrastructure>
...
</systemConfiguration>
Test Cluster Configuration on a Single Host
To test a cluster configuration on a single host (with nodes running on different ports), use the configuration below. This configuration allows more nodes to use a single IP address, so that cluster containing nodes on a single host can be formed. This feature is experimental.
<taskManager>
<localNodeClusteringEnabled>true</localNodeClusteringEnabled>
</taskManager>
In CLI, use -Dmidpoint.taskManager.localNodeClusteringEnabled=true
.
Communication in Cluster Explained
Cluster nodes primarily communicate with the central shared database. Tasks for the nodes to process are stored in this database. The data on which nodes operate when processing the tasks are stored in the database as well. Each task is split to buckets based on a key in the task definition. When the time to start a task comes, worker tasks (also called child tasks) are created. When picked by a node, the worker task selects an available bucket and processes it on the node.
Each node runs its own Quartz Scheduler library. The Quartz library is responsible for the node to pick up available worker tasks and buckets, as well as to prevent any processing collisions with other nodes by storing the runtime information in the JDBC scheduler job store in the repository. To summarize, all communication regarding work distribution happens between the central database and the nodes.
However, there are situations when nodes need to talk to each other directly. A notable occasion requiring node-to-node communication is cache invalidation. When a node changes data in the midPoint database, the node informs other nodes about the need to invalidate their cache. See also Technical Insight into Cache.
Another reason for nodes to communicate directly is user session handling. After an operation on one node changes user attributes, such as assigned roles or permissions, the node propagates this information to other nodes to let them know they need to update their information on what the user can or cannot do. They may need to drop the session altogether if the user has been deactivated.
These situations requiring direct node-to-node communication are the reason why you need to specify an HTTP URL pattern. It is used by midPoint nodes to communicate among themselves.
Since midPoint 4.0, nodes communicate over HTTP instead of JMX. |
You May Get Redirected Between Nodes
To help you understand the intra-cluster communication further, here is an example of a situation when direct node-to-node communication does not happen, although you may expect it would.
If a node runs a task to create a report, for example, the resulting report file is saved on the local file system of the node. If user sitting on a different node requests the report for download, the node on which the user is asks the central DB for the location of the report and then redirects the user to the node with the generated report. Hence, inter-node communication does not occur in this case.
Technical Insight into Cache
MidPoint uses two levels of cache: global and local.
The local cache is per task thread. It holds query objects with results, all touched objects, and version cache, which consists of all versions of modified objects. (Every time an object is modified, a new version of it is created.)
The global cache is per node and holds objects that don’t change often but are accessed very often. These are, for example, system configuration, archetypes, or object templates. These objects are cheap to cache because they don’t change often, but saving them in cache saves a lot of resources. User objects are not cached because they change often, but are rarely needed.
Common Issues and Fixes
These are the critical criteria your configuration must meet:
-
Use a shared repository. All nodes must connect to the same repository.
-
Define node URLs using the
<midpoint><url>…</url></midpoint>
orintraClusterHttpUrlPattern
configuration options in the system configuration to ensure nodes can discover each other. -
Clustering in production requires an active subscription (log error: Clustering is not supported in production mode without a subscription).
Here are a few common issues, their possible causes, and tips on how to resolve them:
-
Unauthorized errors (401)
-
Cause: Missing or invalid subscription ID or misconfigured REST authentication.
-
Fix: Set a valid subscription ID in System > System Configuration > Deployment Information > Subscriptions Identifier. Ensure nodes can authenticate via REST (e.g., shared secrets or OAuth2 if configured).
-
-
Node discovery failures
-
Cause: Incorrect
intraClusterHttpUrlPattern
setting or firewall rules blocking HTTP(S) traffic. -
Fix: Check your deployment configuration and all possibly related network settings. Test connectivity between nodes using
curl
or a similar tool.
-
-
Sticky sessions
-
Cause: Load balancer is not using sticky sessions (e.g.,
ip_hash
in NGINX). -
Fix: Configure the load balancer to maintain session affinity (e.g., by using sticky cookie or source IP).
-
-
Database locks or task scheduling issues
-
Cause: Inconsistent
clustered=true
setting across nodes. -
Fix: Ensure all nodes have the consistently set
clustered=true
.
-
Limitations
Clustering functionality assumes homogeneous cluster environment. That means each cluster node must have the same environment, configuration, connectivity (e.g., to load balancer), connectors and so on. Clustering implementation assumes that a task can be executed on any cluster node, giving the same result regardless. Any configuration differences between cluster nodes are likely to cause operational issues.
The following aspects must be the same on all cluster nodes:
-
Versions of
-
MidPoint
-
Connectors
-
Schema extension[1]
-
Java key store and trust store
-
-
Network access to all configured resources
-
Access to file systems, including network file systems (e.g., for CSV resources)
-
Network configuration, including routing and DNS configuration
Compliance
This feature is related to the following compliance frameworks: