Managing cluster nodes

Last modified 22 Apr 2021 17:31 +02:00
Since 4.1, 4.0.2
This functionality is available since version 4.1, 4.0.2.

MidPoint can run in clustered mode with two or more nodes. Here we describe the most important parameters influencing how nodes are named and managed.

Node identification (naming)

Each node in cluster must have a unique node identifier (name).

nodeId vs. nodeIdSource

There are two properties that can be used to set the node identifier:

Property Meaning Placement Alternative specification Examples

nodeId

A constant value or an expression that yields the node name

<midpoint> section in config.xml

-Dmidpoint.nodeId command line parameter

NodeA, ${env:NodeID}

nodeIdSource

Mechanism that is used to derive node Id (obsolete)

<midpoint> section in config.xml

-Dmidpoint.nodeIdSource command line parameter

hostname, random, sequence

The nodeIdSource was originally meant as a way to assign node identifiers without the need to specify them as constants. However, after nodeId started supporting expressions, nodeIdSource is now simply translated into nodeId.

The translation looks like this:

  • if nodeIdSource value contains ':' (e.g. it is random:number:0:9999) then it is copied into nodeId by wrapping in ${…​}. For example: random:number:0:9999 ${random:number:0:9999}

  • if nodeIdSource value does not contain ':' (e.g. it is hostname) then it is copied into nodeId by wrapping in ${…​} with appended colon at the end. For example: hostname` ${hostname:}`

So, let’s deal with the syntax of nodeId only in the following discussion.

Using nodeId property

Since 4.0.2/4.1, midPoint configuration properties support expressions in the form of ${variable} or ${prefix:variable}. The first form evaluates using a configuration option specified by variable. The second one is more general and supports the following prefixes:

Prefix Meaning Example

sys

References given Java system properties.

${sys:user.home}

env

References given operating system environment variables.

${env:ENVIRONMENT}

hostname

References local host name as determined by midPoint. Note that the colon after hostname is obligatory.

${hostname:}

random

Generates random node ID. Full format is ${random:number:lower-limit:upper-limit} but accepts also forms of ${random:}, ${random:number}, and ${random:number:upper-limit}. Default values are lower limit = 0, upper limit = 999999999. Lower and upper limits are inclusive.

${random:}

sequence

Uses first available node ID in a given sequence. Full format is ${sequence:start:end:format} but accepted forms are also ${sequence:}, ${sequence:start}, and ${sequence:start:end}. Default values are: start = 0, end = 100, format = %d.

${sequence:0:99:%02d}

The sequence expression works like this:

  1. A counter starts at the start value, incrementing by 1 up to (and including) the end value.

  2. At each step, node name is determined using the formatting string and other parts of the expression, and is checked for availability.

  3. If such a node does not exist in the repository, the name is used. Technically speaking, the node name is allocated by creating the node in the repository. If the operation succeeds, the node is acquired. This is to avoid race conditions: only the first midPoint instance that successfully creates a node object can use this name.

  4. If a node with a given name exists but the node is permanently down (this is determined by running property being set to false) the name is used. This is implemented by removing the node object and then retrying the allocation attempt.

  5. Names of nodes that are not marked as down but are not alive are not used here. This is to avoid using names of nodes that are e.g. currently booting, or temporarily unavailable. Please see the Node state management section below.

Note that sequence expression can be combined with other ones. E.g. you can specify nodeId as ${env:ENVIRONMENT}-${sequence:0:99:%02d}, yielding names like Test-01, Test-02, …​, QA-01, QA-02, …​, Prod-01, Prod-02, …​

Node state management

A midPoint node can be typically in one of the following states:

State Characterization

up and alive

Node regularly checks into the repository. Its operationalState property is UP and its lastCheckInTime is regularly updated (less than nodeTimeout ago).

up, but not checking in

There’s an issue with this node. Its operationalState property is UP but its lastCheckInTime is older than nodeTimeout seconds. Nodes in this state are excluded from some operations e.g. status querying or cache invalidation calls.

down

Node’s operationalState property is DOWN. This typically occurs when the node is going down cleanly: it marks itself as down. If node goes down abruptly (without having a chance to do this modification), other nodes watch its lastCheckInTime and after it’s older than nodeAlivenessTimeout ago, they mark the respective node as down by setting its operationalState property to DOWN. This check is occurring every nodeAlivenessCheckInterval seconds. Nodes in this state are excluded from almost all operations.

starting

Node’s operationalState property is STARTING. Nodes in this state are excluded from some operations e.g. status querying or cache invalidation calls.

deleted

Node object no longer exists in the repository. The deletion can occur either manually or by the Cleanup task. The task deletes nodes that have lastCheckInTime older than deadNodes/maxAge ago.

Default parameter values:

Parameter Where it is Description Default value

nodeTimeout

taskManager section of config.xml

When to start considering node as not checked in.

30 seconds

nodeAlivenessTimeout

taskManager section of config.xml

When to start considering node as being down.

900 seconds

nodeAlivenessCheckInterval

taskManager section of config.xml

How often is the node aliveness check carried out.

120 seconds

nodeStartupTimeout

taskManager section of config.xml

When to start reporting node as starting too long.

900 seconds

deadNodes/maxAge

cleanup policy e.g. in the system configuration object

After what not-checked-in time should the node be deleted.

none

 The nodes are not cleaned up by default.
If you'd like to enable this feature, you can set this parameter to e.g. 1 day.
Note that cleanup task runs - by default - once per day.
But you can change this interval or you can schedule other cleanup task, devoted specifically to cleaning up dead nodes.