Bucket Size Analysis

Last modified 27 Oct 2021 12:21 +02:00
EXPERIMENTAL
This feature is experimental. It means that it is not intended for production use. The feature is not finished. It is not stable. The implementation may contain bugs, the configuration may change at any moment without any warning and it may not work at all. Use at your own risk. This feature is not covered by midPoint support. In case that you are interested in supporting development of this feature, please consider purchasing midPoint Platform subscription.

Introduction

Specifying the correct bucketing is the key to successful activity execution in large-scale deployments. If the buckets are too large (and too few), the distribution to worker tasks is inefficient, the progress reporting is rough, and restarting the execution has to re-process too many items. On the other hand, if the buckets are too small (and too many), the overhead of their management is unnecessarily large. And if the buckets are unevenly distributed, some worker tasks have to process big amount of data, while others have literally nothing to do.

To aid with this, midPoint provides the ability to:

  1. report on bucket processing, to allow to analyze the bucket distribution ex-post;

  2. analyze the bucket distribution ex-ante, i.e. without actually executing the activity.

The former can be set up simply by enabling bucket-level reports. The latter is the topic of this document.

Set-Up

There are two key elements of the process, and a single optional one:

  1. Setting bucketAnalysis execution mode. It turns off item fetching and processing.

  2. Enabling buckets execution report. Otherwise, the data would be collected but written nowhere.

  3. Enabling sampling. This will select only a (hopefully) representative subset of all buckets, in order to speed up the processing.

TODO