MidScale: MidPoint Scalability
MidPoint is an open source identity management and governance platform. MidPoint is an established solution for mid-size organizations providing transparency and accountability for personal data processing. Our midPrivacy initiative aims at implementation of unique data protection capabilities to midPoint, thus creating privacy-enhancing identity management platform. Overall, midPoint is a leading open source solution in several identity and access management areas.
However, midPoint has one significant limitation. MidPoint was originally built to address the needs of mid-size enterprises, agencies and universities. Initial design of midPoint data store components favored flexibility and time to market. As midPoint was targeting mid-size organizations the scalability was not high on a list of implementation priorities. But now, midPoint is being deployed to handle scenarios with large number of identities. Deployments that manage students, subscribers and consumers are becoming more and more common. Which makes sense, as these types of users can especially benefit from the data protection capabilities of midPoint. However, such deployments are hitting scalability limitations of current data storage components of midPoint.
Project Goals
Future scalability issues were foreseen in original midPoint design. MidPoint is not bound to any particular data store or database. Thanks to such foresight, midPoint has a flexible and replaceable data storage components. We would like to take advantage of this design feature and re-implement data storage components in a scalable way. Our plan is to redesign the database schema with scalability in mind. Also, we plan to take advantage of innovation in open source databases which was not readily available when midPoint development started almost a decade ago. Therefore we plan to fit our implementation specifically to PostgreSQL database to take full advantage of its capabilities.
We also plan to improve midPoint clustering mechanisms. Our goal is to support autoscaling capabilities used in cloud platforms, thus enhancing the on demand character of midPoint deployments. Higher scale also implies harder requirements on stability and robustness of the product. Therefore we plan to invest part of the effort to improve our quality assurance environment, especially focusing on scalability, performance and stability testing. MidPoint would greatly benefit from user experience improvements that can make administration of millions of identities easier.
Major Achievements
Major achievements of midScale project:
-
Native PostgreSQL repository implementation (internal code name "sqale"). New implementation is based directly on PostgreSQL database, enabling use of PostgreSQL-specific features, avoiding inefficiencies introduced by excessive abstractions of previous database-agnostic implementation.
-
Improved database schema (data model), focusing on efficient data storage and retrieval (a.k.a. "queries"). Queries that were not feasible in previous midPoint versions are currently feasible, even with a considerably larger data set.
-
Axiom query language: human-readable query language, used in large/complex reports and advanced system administration, usually needed in large-scale deployments.
-
Task management system was significantly reworked, especially focusing on distributed (clustered) tasks: introducing concept of flexible "activity", significantly improving error detection and handling, major visibility improvements, faster management of distributed tasks. These improvements were necessary enabler for autoscaling and similar cluster-wide functionality. Improved visibility is essential for management of large, multi-node deployments.
-
Autoscaling: ability to dynamically adapt midPoint performance (tasks) to changed number of nodes in midPoint cluster.
-
Numerous improvements of visibility and diagnostics, significantly improving capability of issue diagnostics in large deployments.
-
Improved user experience of graphical user interface (GUI), allowing more convenient administration of large and complex deployments.
-
Quality assurance: dedicated performance testing environment is used to execute complex test scenarios, including GUI testing featuring Schrodinger testing framework.
This is a short list of major achievements of the project. The project has other achievements and results. Please see complete project report for additional details.
Project Results
MidPoint deployments before midScale project were able to routinely handle tens or hundreds of thousands of identities. We had some success with deployments involving millions of identities, but such deployments usually require special treatment. Goal of midScale project was an increase in midPoint scalability at least by one order of magnitude. We have all the reasons to believe that midPoint deployment after midScale project could routinely handle environments with millions of identities, and that deployments with tens or even hundreds of millions of identities are possible.
The key results of the midScale project are:
-
Key result 1: Improved scalability of midPoint at least by one order of magnitude, enabling deployment beyond millions of managed identities.
-
Key result 2: Improved visibility, diagnostics and reliability of midPoint, making long-term maintenance of large-scale deployments feasible.
-
Key result 3: Improved performance and user experience of midPoint user interface, making management of massive user bases efficient.
Please see complete project report for additional details.
Documents
-
Design and Architecture
-
Documentation
Blog, Articles And Other Media
-
Evolveum Blog
-
On-line Workshops / Webinars
-
What’s New In MidPoint 4.3 (slides) (video) (summary blog)
-
Axiom Query Language (slides) (video) (summary blog)
-
-
NGI_TRUST 9th Results Workshop (slides)
-
Project Management Documents
Deliverables
No | Title | Links |
---|---|---|
D1 |
Architecture and design documentation |
|
D2 |
Axiom query language documentation |
|
D3 |
MidPoint 4.3 "Faraday" release |
|
D4 |
Results of midScale survey |
|
D5 |
Technology workshops, slides and recordings |
|
D6 |
Overview of performance testing environment |
|
D7 |
MidPoint 4.4 "Tesla" release candidate |
Timeline
Milestone | Goal | Planned date | Status |
---|---|---|---|
START |
Project start |
15 Oct 2020 |
DONE |
M1 |
Performance environment |
26 Nov 2020 |
DONE |
M2 |
Performance evaluation and repository analysis/design |
15 Jan 2021 |
DONE |
M3 |
Multithreading, Schrodinger and Query language |
26 Feb 2021 |
DONE |
M4 |
Performance Repo prototype, UI basic tests, Multinode Tasks |
12 Apr 2021 |
DONE |
M5 |
PostgreSQL, Performance environment (2), UX Analysis & Design |
31 May 2021 |
DONE |
M6 |
Repository optimization, performance evaluation |
9 Jul 2021 |
DONE |
M7 |
Migration Procedure, GUI Improvements, Auto-scaling |
31 Aug 2021 |
DONE |
FINISH |
Project finish |
14 Oct 2021 |
DONE |
Funding
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the NGI_TRUST grant agreement no 825618.