Repository Database Support

Last modified 22 Apr 2021 17:31 +02:00

MidPoint was designed with a flexible implementation of its repository subsystem, which allows support for several types of storage technologies. The only storage technology that midPoint supports currently supports is relational database. Relational database always was, and it still is, the reliable workhorse of many identity management solutions, including midPoint.

TL;DR

MIdPoint supports several database engines. However, you should choose PostgreSQL.

Generic vs Native Repository Implementation

Since its very beginning, midPoint was using a very generic approach to relational databases. This approach enabled support for several databases using the same code base. This worked very well for many years. However, there is also a drawback. Generic database code must strictly adhere to standardized SQL and it cannot take advantage of any database-specific features. This means limitations in performance and scalability. Also, supporting and testing several database engines is not an easy task.

As midPoint deployment grow in complexity and size, we are reaching the boundaries of generic database support approach. Therefore we have to specialize our implementation and choose our champion among the database engine. We have chosen PostgreSQL database. PostgreSQL is undoubtedly one of the best open source database engines. As midPoint is open source project, it was crucial to support an open source database. As we have doubts about the open source character of MySQL and the capabilities and community strength of MariaDB, PostgreSQL was a clear choice. That is also the reason that we have decided to focus on PostgreSQL and PostgreSQL was a recommended database engine since midPoint 4.0 release.

We had plans to create a native repository implementation, an implementation that could take full advantage of underlying database, an implementation that can avoid the complications and overhead of generic database data mapping. This has finally materialized in midScale project. MidScale project brings a native, efficient, high-performance, scalable repository implementation for PostgreSQL database.

Therefore, as of midPoint 4.4 there will be two repository implementations:

  • Native PostgreSQL repository implementation, efficient and scalable repository implementation. This should be the default choice for all midPoint deployments. It will be available in midPoint 4.4.

  • Generic repository implementation, supporting PostgreSQL, Microsoft SQL and Oracle databases. This is the implementation that was used since midPoint 3.x. As of midPoint 4.4, this is considered to be a "legacy" implementation. It is still fully supported, however it can have performance and scalability limitations. It is not formally deprecated yet. However, we do not plan any new development or improvement of this implementation.

The Future Of Database Engine Support

Few important facts for the future of database engine support in midPoint:

  • PostgreSQL will be supported and recommended for all new midPoint versions. Newest PostgreSQL releases will be supported by new midPoint releases, given enough time to test new PostgreSQL releases.

  • Support for MySQL and MariaDB was dropped in midPoint 4.3. Need for MySQL/MariaDB in midPoint community seems to be negligible (see the survey), which contributed to a decision to drop MySQL/MariaDB support quickly. These database engines will not be supported any longer.

  • Other commercial database engines (Oracle, Microsoft SQL) remain in the gray zone. We have not made any decision about these. They are currently supported. However, it is quite likely that these databases will not be included in standard midPoint support service. Support services for these databases will need to be purchased separately - to compensate for extra effort supporting them. It is quite clear, that these database engines are not our priority. We currently do not plan to support any newer database versions for these commercial databases. Users that require clear database support plans should choose PostgreSQL database instead.

  • Support for H2 is likely to remain for development and demonstration purposes.

The bottom line is, that if you start a new midPoint deployment, the best choice is to go with PostgreSQL. If you have existing midPoint deployment then you are probably OK to continue operation with current setup, but plan to migrate to PostgreSQL in the future.

Migration to Native PostgreSQL Repository

Firstly and most importantly, there is no need for panic or any other quick action right now. PostgreSQL is the recommended database, but other databases are still supported (except for MariaDB/MySQL, which were deprecated some time ago). The other databases will be supported until there is a clear migration path to PostgreSQL. There is no need to migrate your existing deployment right now. Please wait until midPoint 4.4. MidPoint 4.4 would be an ideal point for you to migrate your database. However, you can keep your existing database even a little bit longer. There is still time to prepare: both for you and midPoint. However, please keep in mind that you will probably need to migrate to PostgreSQL eventually.

We have the following plan for database support:

  1. We are developing native repository implementation for PostgreSQL. This implementation will work only for PostgreSQL and it will take advantage of all the database features that we need. This will come with a new and more efficient database schema. For now this is planned to be released in production-ready quality in midPoint 4.4.

  2. We will create a migration plan (and a guide) from all of the supported databases to the new native PostgreSQL repository. The migration path will most likely involve XML export/import of the data.

  3. Both the new native PostgreSQL repository and the old generic implementations will be supported in midPoint 4.4 LTS, planned for release in late 2021. Expected lifetime of midPoint 4.4 LTS is until late 2024. Therefore you will have at least three-year migration period to PostgreSQL if you stick with midPoint 4.4 LTS.

  4. The plans for database support beyond midPoint 4.4 LTS are still quite fuzzy. We have already dropped database support for MySQL and MariaDB, as there was almost no interest for this support in midPoint community. There are no specific plans for dropping support for Oracle and Microsoft SQL, but it may happen eventually.

The new native PostgreSQL repository has a completely new database schema, designed to take full advantage of PostgreSQL capabilities. This database schema is not compatible with the database schema for the old generic repository. Therefore full data migration is needed for any conversion from (old) generic repository implementation to (new) native repository implementation. The migration is done by exporting all the data to XML, setting up a new empty database and importing the XML data. This migration is needed even for migrating data from existing PostgreSQL deployments to the new native PostgreSQL repository, as the database schemas are not compatible.

Support For Database Engines

MidPoint is using database for its data store. MidPoint needs database engine, but it does not include database engine. Database is installed, configured and managed separately from midPoint. The database engine is not considered to be part of midPoint. Therefore the database engine itself is not included in Evolveum support for midPoint.

Evolveum will make reasonable effort to make sure that midPoint works well with the databases. We are testing midPoint with several databases, using various versions and configurations. We are trying to make sure that midPoint works with the common database engines in common configurations. We will fix the bugs in midPoint database-related code as part of our support problems. In some cases we will also make work-arounds for some frequent and annoying database issues, although such decisions are made on case-by-case basis. Our responsibility is to make sure that midPoint side of midPoint-database interface works well.

However, deployment, configuration and maintenance of the database engine itself is not our responsibility. You have to install the database engine yourself. You have to configure it. We may have some recommendations for you, but database configuration is your responsibility. There are lots of variables here, single-node databases, clustered databases, many performance and availability trade-offs. Those depends on your environment, your requirements and workloads. We cannot possibly make such decisions for you. We provide database schema that midPoint needs. But fine-tuning of the database is up to you. There are small deployments and big deployments, each of them may have different database tuning requirements. For example, audit table may work with default setting in a small deployment that cleans up audit records after few months. But a large deployments that keeps audit records for years will need a very specific solution for storing audit data. Such solution is your responsibility.

Also, we will not fix the bugs in database engine. We will recommend specific version of database engine, but we are not distributing or maintaining the database engine itself. If midPoint does not work and the problem is a bug in the database, the solution will be to fix the bug in the database. Making sure that the bug is fixed is your responsibility. You should have a means to do it. This is usually a support contract with the database vendor or in-house capability.

From a practical perspective, midPoint usually works well with recent versions of PostgreSQL in usual configurations. Small midPoint deployment that are not performance-sensitive can probably operate without any problems even without any special support coverage for the database engine, just making sure that the database engine is properly maintained. Larger midPoint deployments will surely benefit from a dedicated PostgreSQL support. Such support contract can surely be consolidated with other applications in your organizations that are using PostgreSQL and it is also a great way how to support PostgreSQL project. Therefore we always recommend to secure a support for PostgreSQL database if you can afford it.

As for commercial and semi-commercial database (Oracle, MS SQL) we always strongly recommend to purchase a support contract for database engine if you insist on using such database. However, perhaps the best strategy would be to migrate to PostgreSQL eventually.

Clusters and Cloud

Generally speaking, midPoint is supported in clustered database environments. Simply speaking: if midPoint works for you with a single-node database, then it will be most likely work for you also in when deployed with database cluster. However, there are limitations:

  • Only environments that support full consistency guarantees are supported. Which means, that midPoint can only work for clustered configurations that can provide full ACID consistency and that are also configured to provide such guarantees. MidPoint will not work in environments with read-only replicas, environments that provide eventual consistency or any weaker consistency guarantees.

  • Proper configuration of database clusters is a complex task that often involves trade-offs. For example clusters built for high availability and robustness may increase data maintenance overhead and it may result in lower overall performance. Analysis, design, proper configuration and maintenance of database clusters is your responsibility. Evolveum support will not resolve issues that are caused by inappropriate design or configuration of database clusters. It is unrealistic to expect that midPoint will be highly available or more performant just because it runs on a database cluster. The cluster has to be properly designed and configured to satisfy specific needs of each deployment.

  • Database clusters may be configured in a variety of ways. Even a small configuration or tuning changes may cause issues. Even though midPoint is tested in a variety of database configurations during development, it is unrealistic to expect that it can be tested for every combination of database engine, versions, configurations and clustering topologies. If you happen to experience an issue with midPoint operation, we have to reproduce the issue in order to have any realistic chance to fix it. Some issues can be reproduced in our testing environment. However, presence of database clustering makes reproduction of issues much harder. Therefore please be prepared that Evolveum team may request an access to your testing environment where the issue can be reproduced in order to diagnose and fix an issue.