Repository Implementation

Last modified 02 Jun 2021 19:53 +02:00

Repository responsibilities

These are based on RepositoryService interface and there should be no difference compared to the legacy repository. Each point contains design/implementation notes.

  • Repository stores midPoint objects (subtypes of ObjectType), assigning their OID if missing. For this m_object table with oid (UUID) and fullObject (BYTEA) columns would be enough. However, this is not acceptable for searchObjects performance see note there.

  • Repository assigns any missing container IDs (CID) for object’s containers. These IDs are unique inside a single object and assigned loosely sequentially. For this m_object.cid_seq is stored in the database and it contains the next available CID. IDs are assigned only to multi-valued containers, e.g. object’s metadata does not need any CID.

  • Repository allows for searching for objects and selected containers with reasonable performance. This is why objects stored as documents (fullObject) is not enough. Various types are stored in specific sub-tables instead with values extracted to separate columns. There are other related tables, typically for containers and other multi-value properties. Not all values are by default searchable/indexable in this way and it must be possible to specify what other properties are indexed. These can be properties from the default schema but also from schema extensions.

  • Each object is a separate "aggregate" and any inter-object references are soft, not constraints on the database level. For example, multi-value container roleMembershipRef is stored in m_ref_role_membership table which has a foreign key to the common OID pool m_object_oid (this is necessary for any FK pointing to object OID if multiple types are possible, FK to m_object does not work this way).

TODO (not-implemented yet)

  • Modify object with reindex

  • Exists filter

  • Type filter

  • Filter with multi-value items like subtypes and policySituations

  • Right-hand path support (e.g. column = other column, including .. support)

  • Better OperationResult treatment/review

  • FullText filter (textInfo)

  • Extension properties

  • Retry mechanism

  • Audit service

Adding object

TODO

Collation (ordering)

Depending on the chosen collation ordering of text fields can be case-insensitive or not. To see it quickly, one can use select like this:

select * from (values ('a'), ('B'), ('Z'), ('x'), ('Č'),
    ('@'), ('_'), ('%'), ('^'), ('5'), ('47'), ('持')) as t(text)
  order by text -- collation "C"

With additional collation "x" one can see various results for other collations. If en_US.UTF-8 is used during database creation (recommended by midPoint docs) the ordering of the select above should be case-insensitive. Collation used by default for each database can be determined like so:

select datname, datcollate from pg_database;
-- returns among others: midpoint,en_US.UTF-8

Funny enough, this default collation name may not be a valid name for COLLATION "collation-name" clause but for en_US.UTF-8 it seems to act the same as collate "en_US.utf8" (or en_US). The list of valid collation names can be obtained from pg_collation table:

select * from pg_collation;

As explained in the collation documentation, there is a default collation (collprovider = 'd'), libc based collations (d) and ICU collations (i) if compiled with it (PG 13 on Ubuntu contains these too).

It is possible to create tables or single columns with non-default collation. To list columns with different collation one can use this select (with or without table_schema filter, select based on this answer):

select table_schema, table_name, column_name, collation_name
  from information_schema.columns
  where collation_name is not null and table_schema = 'public'
order by table_schema, table_name, ordinal_position;
If query is used with custom COLLATE regularly the table should be indexed accordingly.

If case-sensitive locale is used when case-insensitive behavior is expected it may result in unexpected results. The old repository uses VARCHAR type for OIDs and bucket defining queries for extreme boundaries accidentally caught more than was intended. E.g. oid > 'FF' can also include all the OIDs starting with lower-case A through F, if interpreted case-sensitively - which it is with collate "C" for example (but not with en_US at least not for PG 12 or 13). See MID-6468 for possible problem manifestation.

Creating DB with other collation

As described in the notes for CREATE DATABASE one may need to specify template0 as a template for database creation with different collation. Adding collation support for other languages to the operating system and then adding it to PG is beyond this page, but is described in the docs.