MidScale: MidPoint Scalability

Last modified 19 Oct 2021 08:15 +02:00

MidPoint is an open source identity management and governance platform. MidPoint is an established solution for mid-size organizations providing transparency and accountability for personal data processing. Our midPrivacy initiative aims at implementation of unique data protection capabilities to midPoint, thus creating privacy-enhancing identity management platform. Overall, midPoint is a leading open source solution in several identity and access management areas.

However, midPoint has one significant limitation. MidPoint was originally built to address the needs of mid-size enterprises, agencies and universities. Initial design of midPoint data store components favored flexibility and time to market. As midPoint was targeting mid-size organizations the scalability was not high on a list of implementation priorities. But now, midPoint is being deployed to handle scenarios with large number of identities. Deployments that manage students, subscribers and consumers are becoming more and more common. Which makes sense, as these types of users can especially benefit from the data protection capabilities of midPoint. However, such deployments are hitting scalability limitations of current data storage components of midPoint.

Project Goals

Future scalability issues were foreseen in original midPoint design. MidPoint is not bound to any particular data store or database. Thanks to such foresight, midPoint has a flexible and replaceable data storage components. We would like to take advantage of this design feature and re-implement data storage components in a scalable way. Our plan is to redesign the database schema with scalability in mind. Also, we plan to take advantage of innovation in open source databases which was not readily available when midPoint development started almost a decade ago. Therefore we plan to fit our implementation specifically to PostgreSQL database to take full advantage of its capabilities.

We also plan to improve midPoint clustering mechanisms. Our goal is to support autoscaling capabilities used in cloud platforms, thus enhancing the on demand character of midPoint deployments. Higher scale also implies harder requirements on stability and robustness of the product. Therefore we plan to invest part of the effort to improve our quality assurance environment, especially focusing on scalability, performance and stability testing. MidPoint would greatly benefit from user experience improvements that can make administration of millions of identities easier.

Major Achievements

Major achievements of midScale project:

  • Native PostgreSQL repository implementation (internal code name "sqale"). New implementation is based directly on PostgreSQL database, enabling use of PostgreSQL-specific features, avoiding inefficiencies introduced by excessive abstractions of previous database-agnostic implementation.

  • Improved database schema (data model), focusing on efficient data storage and retrieval (a.k.a. "queries"). Queries that were not feasible in previous midPoint versions are currently feasible, even with a considerably larger data set.

  • Axiom query language: human-readable query language, used in large/complex reports and advanced system administration, usually needed in large-scale deployments.

  • Task management system was significantly reworked, especially focusing on distributed (clustered) tasks: introducing concept of flexible "activity", significantly improving error detection and handling, major visibility improvements, faster management of distributed tasks. These improvements were necessary enabler for autoscaling and similar cluster-wide functionality. Improved visibility is essential for management of large, multi-node deployments.

  • Autoscaling: ability to dynamically adapt midPoint performance (tasks) to changed number of nodes in midPoint cluster.

  • Numerous improvements of visibility and diagnostics, significantly improving capability of issue diagnostics in large deployments.

  • Improved user experience of graphical user interface (GUI), allowing more convenient administration of large and complex deployments.

  • Quality assurance: dedicated performance testing environment is used to execute complex test scenarios, including GUI testing featuring Schrodinger testing framework.

This is a short list of major achievements of the project. The project has other achievements and results. Please see complete project report for additional details.

Project Results

MidPoint deployments before midScale project were able to routinely handle tens or hundreds of thousands of identities. We had some success with deployments involving millions of identities, but such deployments usually require special treatment. Goal of midScale project was an increase in midPoint scalability at least by one order of magnitude. We have all the reasons to believe that midPoint deployment after midScale project could routinely handle environments with millions of identities, and that deployments with tens or even hundreds of millions of identities are possible.

The key results of the midScale project are:

  • Key result 1: Improved scalability of midPoint at least by one order of magnitude, enabling deployment beyond millions of managed identities.

  • Key result 2: Improved visibility, diagnostics and reliability of midPoint, making long-term maintenance of large-scale deployments feasible.

  • Key result 3: Improved performance and user experience of midPoint user interface, making management of massive user bases efficient.

Please see complete project report for additional details.

Deliverables

No Title Links

D1

Architecture and design documentation

MidScale Solution Architecture
MidScale Design Notes

D2

Axiom query language documentation

Axiom Query Language

D3

MidPoint 4.3 "Faraday" release

MidPoint 4.3 "Faraday"

D4

Results of midScale survey

MidScale Survey

D5

Technology workshops, slides and recordings

Talks

D6

Overview of performance testing environment

MidScale Infrastructure Documentation

D7

MidPoint 4.4 "Tesla" release candidate

MidPoint 4.4-RC1 source code on github

Timeline

Milestone Goal Planned date Status

START

Project start

15 Oct 2020

DONE

M1
(MidPoint 4.3 M1)

Performance environment

26 Nov 2020

DONE
See M1 Outcomes

M2
(MidPoint 4.3 M2)

Performance evaluation and repository analysis/design

15 Jan 2021

DONE
See M2 Outcomes

M3
(MidPoint 4.3 M3)

Multithreading, Schrodinger and Query language

26 Feb 2021

DONE
See M3 Outcomes

M4
(MidPoint 4.3 RELEASE)

Performance Repo prototype, UI basic tests, Multinode Tasks

12 Apr 2021

DONE
See M4 Outcomes

M5
(MidPoint 4.4 M1)

PostgreSQL, Performance environment (2), UX Analysis & Design

31 May 2021

DONE
See M5 Outcomes

M6
(MidPoint 4.4 M2)

Repository optimization, performance evaluation

9 Jul 2021

DONE
See M6 Outcomes

M7
(MidPoint 4.4 M3)

Migration Procedure, GUI Improvements, Auto-scaling

31 Aug 2021

DONE
See M7 Outcomes

FINISH
(MidPoint 4.4 RELEASE)

Project finish

14 Oct 2021

DONE
See M8 Outcomes

Funding

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the NGI_TRUST grant agreement no 825618.