Clustering / high availability setup

Introduction

Clustering / high availability can be achieved by setting up two midPoint nodes working against common midPoint repository.

In order to do this, it is necessary to set a couple of parameters in midPoint configuration.

An example, when using PostgreSQL database:

``````<repository>
<repositoryServiceFactoryClass>com.evolveum.midpoint.repo.sql.SqlRepositoryFactory</repositoryServiceFactoryClass>
<database>postgresql</database>
<jdbcUrl>jdbc:postgresql://..../midpoint</jdbcUrl>
<hibernateHbm2ddl>none</hibernateHbm2ddl>
<missingSchemaAction>create</missingSchemaAction>
</repository>
<clustered>true</clustered>

Typically you set the following configuration parameter only:

Parameter Description

clustered

Determines if the installation is running in clustered (failover) mode. Default is false. If you need clustering/failover, set this to true.

In some circumstances the Quartz component in task manager needs to use separate database (usually only if you want to run it on H2). Then you need to configure that properly; see the section on H2 use below.

Important: if there are more nodes sharing a repository, all of them must have parameter `clustered` set to `true`. Otherwise, tasks will not be scheduled correctly. (midPoint will disable scheduling tasks on non-conformant nodes, i.e. on non-clustered nodes that are parts of such a system.) The best way how to ensure this is to have common config file. But if that’s not possible or practical, make sure that all nodes have the same settings.

Also, ensure your system time is synchronized across all node members (using NTP or similar service), otherwise strange behaviour may occur such as tasks restart on different nodes.

Other cluster configuration items (midPoint 4.0 and above)

The main difference between 4.0 and previous versions is that since midPoint 4.0 the main mechanism for clusterwide task management is REST instead of JMX.

You can use `config.xml` configuration parameters placed directly under the `<midpoint>` element (Configuration parameter in the table below). Alternatively, you can use the command line options `-Dkey=value` to set these parameters (Command-line parameter in the table below).

Configuration items are:

Command-line parameter Configuration parameter Description

`-Dmidpoint.nodeId`

`nodeId`

The node identifier. The default is `DefaultNode` for non-clustered deployments. For clustered ones, either `nodeId` or `nodeIdSource` must be used.

`-Dmidpoint.nodeIdSource`

`nodeIdSource`

Source of the node identifier. It is applied if explicit node ID is not defined. The source can be either `hostname` meaning that the host name is used as the node identifier or `random` meaning that the random value for node ID is generated when the node is started.

`-Dmidpoint.hostName`

`hostName`

Overrides the local host name information. If not specified, the operatig system is used to determine the host name.

Normally you do not need to specify this information.

`-Dmidpoint.httpPort`

`httpPort`

Overrides the local HTTP port information. If not specified, Tomcat/Catalina JMX objects are queried to determine the HTTP port information. This information is used only to construct URL address used for intra-cluster communication (see below).

Normally you do not need to specify this information.

Use `-Dserver.port=xxx` instead to start midPoint using the right port.

`-Dmidpoint.url`

`url`

Overrides the intra-cluster URL information (see below).

Normally you do not need to specify this information.

How intra-cluster URL is determined

In order to minimize the configuration work needed while keeping the maximum level of flexibility, the node URL used for intra-cluster communication (e.g. https://node1.acme.org:8080/midpoint) is derived from the following items - in this order:

1. `<urlOverride>` property in the Node object in the repository

2. `-Dmidpoint.url` / `<url>` information in command line or `config.xml` file

3. computed based on information in `infrastructure/intraClusterHttpUrlPattern` property, if defined; that property can use the following macros:

1. `$host` for host name (obtained dynamically from OS or overridden via `-Dmidpoint.hostname` or `<hostname>` config property) 2. `$port` for HTTP port (obtained dynamically from Tomcat JMX objects or overridden via `-Dmidpoint.httpPort` config property)

3. `$path` for midPoint URL path (obtained dynamically from the servlet container) 4. computed based on protocol scheme (obtained dynamically from Tomcat JMX objects), host name, port, and servlet path, as `scheme://host:port/path`. When troubleshooting these mechanisms you can set logging for `com.evolveum.midpoint.task.quartzimpl.cluster.NodeRegistrar` (or the whole task manager module) to `DEBUG`. Testing cluster on a single node If you want to test the cluster on a single node (running on different ports, of course) you need to set the following experimental configuration parameter to the value of `true`: Command-line parameter Configuration parameter Meaning -Dmidpoint.taskManager.localNodeClusteringEnabled <localNodeClusteringEnabled> in <taskManager> section Allows more nodes to use a single IP address. (So that cluster containing mode nodes on a single host can be formed.) Experimental. Configuring the cluster before midPoint 4.0 Mainly because of JMX limitations, some parameters have to be set up via Java system properties. In the following we expect the Oracle JRE is used. Parameter Meaning midpoint.nodeId This is an identifier of the local node. It is not part of the midPoint configuration, because we assume that this configuration file will be shared among cluster members. The default value is: DefaultNode. However, when running in clustered mode, there is no default, and this property must be explicitly specified. midpoint.jmxHostName Host name on which this node wants to be contacted (via JMX) by other nodes in cluster. (It will be announced to other nodes via Node record in repository.) Usually not necessary to specify, as the default is the current host IP address. com.sun.management.jmxremote.port This is the port on which JMX agent will listen. It must be specified for clustered mode, because JMX is used to query status of individual nodes and to manage them (start/stop scheduler, stop tasks on that node). And, if you test a clustering/failover configuration (more midPoint nodes) on a single machine, be sure to set this parameter to different values for individual midPoint nodes. Otherwise, you will get "java.net.BindException: Address already in use: JVM_Bind" exception on tomcat startup. com.sun.management.jmxremote.ssl Whether SSL will be used for JMX communication. For sample installations it can be set to `false`, however, for production use we recommend setting it to`true` (alongside other SSL-related JMX properties, see http://docs.oracle.com/javase/1.5.0/docs/guide/management/agent.html#remote. com.sun.management.jmxremote.password.file and com.sun.management.jmxremote.access.file Names of the password and access files for JMX authentication and authorization. E.g. d:\midpoint\config\jmxremote.password, d:\midpoint\config\jmxremote.access. Examples of these files are in the `samples/jmx` directory in SVN.Beware, the jmxremote.password file must be readable only to its owner (i.e. user who starts the tomcat), otherwise the JVM refuses to start. In Windows, you typically have to stop inheriting permissions to this file, and manually remove all entries that grant access to persons other than the owner. Also, the following configuration items in `<taskManager>` section of `config.xml` have to be set: Parameter Meaning jmxUsername, jmxPassword Credentials used for JMX communication among cluster nodes. Default values are `midpoint` and `secret` respectively, but we strongly recommend changing at least the JMX password. Currently, all nodes should be accessible using the same credentials. An example NodeA (in catalina.bat) ``````SET CATALINA_OPTS=-Dmidpoint.nodeId=NodeA \ -Dmidpoint.home=d:\midpoint\home \ -Dcom.sun.management.jmxremote=true \ -Dcom.sun.management.jmxremote.port=20001 \ -Dcom.sun.management.jmxremote.ssl=false \ -Dcom.sun.management.jmxremote.password.file=d:\midpoint\home\jmxremote.password \ -Dcom.sun.management.jmxremote.access.file=d:\midpoint\home\jmxremote.access`````` NodeB (in catalina.bat) ``````SET CATALINA_OPTS=-Dmidpoint.nodeId=NodeB \ -Dmidpoint.home=d:\midpoint\home \ -Dcom.sun.management.jmxremote=true \ -Dcom.sun.management.jmxremote.port=20002 \ -Dcom.sun.management.jmxremote.ssl=false \ -Dcom.sun.management.jmxremote.password.file=d:\midpoint\home\jmxremote.password \ -Dcom.sun.management.jmxremote.access.file=d:\midpoint\home\jmxremote.access`````` (Note: the jmx port is set to 20002 just to allow running both nodes on a single machine. If you are sure they will not be run on a single machine, we recommend setting the port to the same value, just for simplicity.) (Note: when you have firewall, please also set com.sun.management.jmxremote.rmi.port to the same port as com.sun.management.jmxremote.port) Cluster infrastructure configuration Even if 3.9 and below there are some types of information (e.g. reports) that are accessed using REST calls. So, midpoint needs to have an intra-cluster HTTP URL pattern specified. This should be the HTTP/HTTPS pattern which is used by midpoint nodes to communicate with each others. The pattern is in fact an URL prefix pointing to the root URL of the application. The pattern is specified in the system configuration object as present in the example below. ``````<systemConfiguration> ... <infrastructure> <intraClusterHttpUrlPattern>https://$host/midpoint</intraClusterHttpUrlPattern>
</infrastructure>
...
</systemConfiguration>``````

Troubleshooting JMX

The following message(s) may appear in idm.log if there is a problem with JMX password:

``````2014-03-04 14:05:31,692 [TASKMANAGER] [http-bio-8080-exec-3] TRACE (com.evolveum.midpoint.task.quartzimpl.execution.ExecutionManager): Getting node and task info from the current node (Tomcat7_Node1)
2014-03-04 14:05:31,700 [MODEL] [http-bio-8080-exec-3] ERROR (com.evolveum.midpoint.model.controller.ModelController): Couldn't search objects in task manager, reason: Authentication failed! Invalid username or password
2014-03-04 14:05:31,701 [] [http-bio-8080-exec-3] ERROR (com.evolveum.midpoint.web.page.admin.server.dto.NodeDtoProvider): Unhandled exception when listing nodes, reason: Subresult com.evolveum.midpoint.task.api.TaskManager..searchObjects of operation com.evolveum.midpoint.model.controller.ModelController.searchObjects is still UNKNOWN during cleanup; during handling of exception java.lang.SecurityException: Authentication failed! Invalid username or password``````

The following message(s) may appear in idm.log if there is problem with firewall between IDM nodes:

``2014-05-26 09:07:38,438 [TASKMANAGER] [http-bio-8181-exec-1] ERROR (com.evolveum.midpoint.task.quartzimpl.execution.RemoteNodesManager): Cannot connect to the remote node node02 at 10.1.1.2:8123, reason: Failed to retrieve RMIServer stub: javax.naming.CommunicationException [Root exception is java.rmi.ConnectIOException: Exception creating connection to: 10.1.1.2; nested exception is:  java.net.NoRouteToHostException: No route to host]``

Please note that it seems that JMX communication needs more than the JMX port specified in Tomcat startup configuration (in this fragment, 8123)! I resolved the problem by simply allowing all TCP communication between the nodes. I will update this solution after I find a better one ☺

The following message may appear if your clock is not synchronized between midPoint nodes:

``2014-05-26 00:45:32,818 [TASKMANAGER] [QuartzScheduler_midPointScheduler-node02_ClusterManager] WARN (org.quartz.impl.jdbcjobstore.JobStoreTX): This scheduler instance (node02) is still active but was recovered by another instance in the cluster.  This may cause inconsistent behavior.``

Using H2 when clustered

Using H2 in clustered mode is not recommended because of needless complexity. First, it needs to be specified to run in standalone process. And second, Quartz and midPoint need to use separate MVCC-related settings.

An example:

``````<repository>
<repositoryServiceFactoryClass>com.evolveum.midpoint.repo.sql.SqlRepositoryFactory</repositoryServiceFactoryClass>
<baseDir>\${midpoint.home}</baseDir>
<embedded>false</embedded>
<asServer>true</asServer>
<driverClassName>org.h2.Driver</driverClassName>
<jdbcUrl>jdbc:h2:tcp://localhost:6000/~/midpoint;LOCK_MODE=1;DB_CLOSE_ON_EXIT=FALSE;LOCK_TIMEOUT=10000</jdbcUrl>
<hibernateDialect>org.hibernate.dialect.H2Dialect</hibernateDialect>
<hibernateHbm2ddl>update</hibernateHbm2ddl>
</repository>
<clustered>true</clustered>
<jdbcUrl>jdbc:h2:tcp://localhost:6000/~/midpoint-quartz;MVCC=TRUE;DB_CLOSE_ON_EXIT=FALSE</jdbcUrl>

The following task manager settings are relevant in this context:

Parameter Meaning

jdbcUrl

If you are using H2, you have to set up use different database parameters from those used by midPoint repository. And, because MVCC mode is to be enabled, task manager has to use a database instance different from the one used by the repository.(If you are using database other than H2, you may skip setting special jdbcUrl in the <taskManager> configuration. The jdbcUrl from repository config will be used. As a result, Quartz tables will be stored in the same database instance as midPoint tables.)

dataSource

Uses specified data source to obtain DB connections. (See Repository Configuration).

Other task manager database settings (e.g. jdbc username and password, driver class name, hibernate dialect) are taken by default from <repository> configuration, but, of course, they may be overridden in task manager configuration.

H2 then has to be started independently of both nodes. In this case, it is expected to listen on port 6000. To do that, you can use e.g. this command line:

``java -jar h2-1.3.171.jar -tcp -tcpPort 6000 -tcpAllowOthers``