Managing cluster nodes

Last modified 11 Jan 2023 15:14 +01:00

Since 4.1, 4.0.2

This functionality is available since version 4.1, 4.0.2.

MidPoint can run in clustered mode with two or more nodes. Here we describe the most important parameters influencing how nodes are named and managed.

Node identification (naming)

Each node in cluster must have a unique node identifier (name).

nodeId vs. nodeIdSource

There are two properties that can be used to set the node identifier:

Property Meaning Placement Alternative specification Examples

Property	Meaning	Placement	Alternative specification	Examples
`nodeId`	A constant value or an expression that yields the node name	`<midpoint>` section in `config.xml`	`-Dmidpoint.nodeId` command line parameter	`NodeA`, `${env:NodeID}`
`nodeIdSource`	Mechanism that is used to derive node Id (obsolete)	`<midpoint>` section in `config.xml`	`-Dmidpoint.nodeIdSource` command line parameter	`hostname`, `random`, `sequence`

nodeId

A constant value or an expression that yields the node name

<midpoint> section in config.xml

-Dmidpoint.nodeId command line parameter

NodeA, ${env:NodeID}

nodeIdSource

Mechanism that is used to derive node Id (obsolete)

<midpoint> section in config.xml

-Dmidpoint.nodeIdSource command line parameter

hostname, random, sequence

The nodeIdSource was originally meant as a way to assign node identifiers without the need to specify them as constants. However, after nodeId started supporting expressions, nodeIdSource is now simply translated into nodeId.

The translation looks like this:

if nodeIdSource value contains ':' (e.g. it is random:number:0:9999) then it is copied into nodeId by wrapping in ${…}. For example: random:number:0:9999 ${random:number:0:9999}
if nodeIdSource value does not contain ':' (e.g. it is hostname) then it is copied into nodeId by wrapping in ${…} with appended colon at the end. For example: hostname ${hostname:}

So, let’s deal with the syntax of nodeId only in the following discussion.

Using nodeId property

Since 4.0.2/4.1, midPoint configuration properties support expressions in the form of ${variable} or ${prefix:variable}. The first form evaluates using a configuration option specified by variable. The second one is more general and supports the following prefixes:

Prefix Meaning Example

Prefix	Meaning	Example
`sys`	References given Java system properties.	`${sys:user.home}`
`env`	References given operating system environment variables.	`${env:ENVIRONMENT}`
`hostname`	References local host name as determined by midPoint. Note that the colon after hostname is obligatory.	`${hostname:}`
`random`	Generates random node ID. Full format is `${random:number:lower-limit:upper-limit}` but accepts also forms of `${random:}`, `${random:number}`, and `${random:number:upper-limit}`. Default values are lower limit = 0, upper limit = 999999999. Lower and upper limits are inclusive.	`${random:}`
`sequence`	Uses first available node ID in a given sequence. Full format is `${sequence:start:end:format}` but accepted forms are also `${sequence:}`, `${sequence:start}`, and `${sequence:start:end}`. Default values are: start = 0, end = 100, format = %d.	`${sequence:0:99:%02d}`

sys

References given Java system properties.

${sys:user.home}

env

References given operating system environment variables.

${env:ENVIRONMENT}

hostname

References local host name as determined by midPoint. Note that the colon after hostname is obligatory.

${hostname:}

random

Generates random node ID. Full format is ${random:number:lower-limit:upper-limit} but accepts also forms of ${random:}, ${random:number}, and ${random:number:upper-limit}. Default values are lower limit = 0, upper limit = 999999999. Lower and upper limits are inclusive.

${random:}

sequence

Uses first available node ID in a given sequence. Full format is ${sequence:start:end:format} but accepted forms are also ${sequence:}, ${sequence:start}, and ${sequence:start:end}. Default values are: start = 0, end = 100, format = %d.

${sequence:0:99:%02d}

The sequence expression works like this:

A counter starts at the start value, incrementing by 1 up to (and including) the end value.
At each step, node name is determined using the formatting string and other parts of the expression, and is checked for availability.
If such a node does not exist in the repository, the name is used. Technically speaking, the node name is allocated by creating the node in the repository. If the operation succeeds, the node is acquired. This is to avoid race conditions: only the first midPoint instance that successfully creates a node object can use this name.
If a node with a given name exists but the node is permanently down (this is determined by running property being set to false) the name is used. This is implemented by removing the node object and then retrying the allocation attempt.
Names of nodes that are not marked as down but are not alive are not used here. This is to avoid using names of nodes that are e.g. currently booting, or temporarily unavailable. Please see the Node state management section below.

Note that sequence expression can be combined with other ones. E.g. you can specify nodeId as ${env:ENVIRONMENT}-${sequence:0:99:%02d}, yielding names like Test-01, Test-02, …, QA-01, QA-02, …, Prod-01, Prod-02, …

Node state management

A midPoint node can be typically in one of the following states:

State Characterization

State	Characterization
up and alive	Node regularly checks into the repository. Its `operationalState` property is `UP` and its `lastCheckInTime` is regularly updated (less than `nodeTimeout` ago).
up, but not checking in	There’s an issue with this node. Its `operationalState` property is `UP` but its `lastCheckInTime` is older than `nodeTimeout` seconds. Nodes in this state are excluded from some operations e.g. status querying or cache invalidation calls.
down	Node’s `operationalState` property is `DOWN`. This typically occurs when the node is going down cleanly: it marks itself as down. If node goes down abruptly (without having a chance to do this modification), other nodes watch its `lastCheckInTime` and after it’s older than `nodeAlivenessTimeout` ago, they mark the respective node as down by setting its `operationalState` property to `DOWN`. This check is occurring every `nodeAlivenessCheckInterval` seconds. Nodes in this state are excluded from almost all operations.
starting	Node’s `operationalState` property is `STARTING`. Nodes in this state are excluded from some operations e.g. status querying or cache invalidation calls.
deleted	Node object no longer exists in the repository. The deletion can occur either manually or by the Cleanup task. The task deletes nodes that have `lastCheckInTime` older than `deadNodes/maxAge` ago.

up and alive

Node regularly checks into the repository. Its operationalState property is UP and its lastCheckInTime is regularly updated (less than nodeTimeout ago).

up, but not checking in

There’s an issue with this node. Its operationalState property is UP but its lastCheckInTime is older than nodeTimeout seconds. Nodes in this state are excluded from some operations e.g. status querying or cache invalidation calls.

down

Node’s operationalState property is DOWN. This typically occurs when the node is going down cleanly: it marks itself as down. If node goes down abruptly (without having a chance to do this modification), other nodes watch its lastCheckInTime and after it’s older than nodeAlivenessTimeout ago, they mark the respective node as down by setting its operationalState property to DOWN. This check is occurring every nodeAlivenessCheckInterval seconds. Nodes in this state are excluded from almost all operations.

starting

Node’s operationalState property is STARTING. Nodes in this state are excluded from some operations e.g. status querying or cache invalidation calls.

deleted

Node object no longer exists in the repository. The deletion can occur either manually or by the Cleanup task. The task deletes nodes that have lastCheckInTime older than deadNodes/maxAge ago.

Default parameter values:

Parameter Where it is Description Default value

Parameter	Where it is	Description	Default value
`nodeTimeout`	`taskManager` section of `config.xml`	When to start considering node as not checked in.	30 seconds
`nodeAlivenessTimeout`	`taskManager` section of `config.xml`	When to start considering node as being down.	900 seconds
`nodeAlivenessCheckInterval`	`taskManager` section of `config.xml`	How often is the node aliveness check carried out.	120 seconds
`nodeStartupTimeout`	`taskManager` section of `config.xml`	When to start reporting node as starting too long.	900 seconds
`deadNodes/maxAge`	cleanup policy e.g. in the system configuration object	After what not-checked-in time should the node be deleted.	none

nodeTimeout

taskManager section of config.xml

When to start considering node as not checked in.

30 seconds

nodeAlivenessTimeout

taskManager section of config.xml

When to start considering node as being down.

900 seconds

nodeAlivenessCheckInterval

taskManager section of config.xml

How often is the node aliveness check carried out.

120 seconds

nodeStartupTimeout

taskManager section of config.xml

When to start reporting node as starting too long.

900 seconds

deadNodes/maxAge

cleanup policy e.g. in the system configuration object

After what not-checked-in time should the node be deleted.

none

 The nodes are not cleaned up by default.
If you'd like to enable this feature, you can set this parameter to e.g. 1 day.
Note that cleanup task runs - by default - once per day.
But you can change this interval or you can schedule other cleanup task, devoted specifically to cleaning up dead nodes.

Was this page helpful?

YES NO

Thanks for your feedback