Repository Implementation

Last modified 16 Sep 2021 11:26 +02:00

Repository responsibilities

These are based on RepositoryService interface and there should be no difference compared to the legacy repository. Each point contains design/implementation notes.

  • Repository stores midPoint objects (subtypes of ObjectType), assigning their OID if missing. For this m_object table with oid (UUID) and fullObject (BYTEA) columns would be enough. However, this is not acceptable for searchObjects performance see note there.

  • Repository assigns any missing container IDs (CID) for object’s containers. These IDs are unique inside a single object and assigned loosely sequentially. For this m_object.cid_seq is stored in the database and it contains the next available CID. IDs are assigned only to multi-valued containers, e.g. object’s metadata does not need any CID.

  • Repository allows for searching for objects and selected containers with reasonable performance. This is why objects stored as documents (fullObject) is not enough. Various types are stored in specific sub-tables instead with values extracted to separate columns. There are other related tables, typically for containers and other multi-value properties. Not all values are by default searchable/indexable in this way and it must be possible to specify what other properties are indexed. These can be properties from the default schema but also from schema extensions.

  • Each object is a separate "aggregate" and any inter-object references are soft, not constraints on the database level. For example, multi-value container roleMembershipRef is stored in m_ref_role_membership table which has a foreign key to the common OID pool m_object_oid (this is necessary for any FK pointing to object OID if multiple types are possible, FK to m_object does not work this way).

TODO

Open, Not-implemented yet

  • FullText filter (textInfo), including storage of the info (Virgo, WIP)

  • Query playground: executeQueryDiagnostics()

Notes, non-issues

Retry mechanism (if/where necessary) may not be necessary if tests don’t show that. SELECT …​ FOR UPDATE seems to do its work just fine.

Minor, tentative, questinable, don’t implement while it’s in this list:

  • Dereferencing (@) for extension refs stored in JSONB.

  • Grouping is not supported and the semantics for SQL without ORM is unclear. This is not used by midPoint itself and there are no tests for it.

  • Arbitrary ordering for iterative search - currently only one path is supported. Fixes for NULL values are needed - TODO in the code. Multi-path ordering throws - this can probably wait a bit.

Closed TODOs

  • Modify object with reindex; implemented as full delete/add cycle, very reliable and potentially more efficient too.

  • Right-hand path support, e.g. column = otherColumn, including other comparison operations. Minimal needed support implemented, .. not supported, but polystrings are.

  • Support for various get/search/modify options WIP

    • RepoModifyOptions.useNoFetchExtensionValuesInsertion and useNoFetchExtensionValuesDeletion may be obsolete and ignored by the new repository.

    • GetOperationOptions.iterationMethod is ignored by the new repository, don’t implement.

  • JPEG out of full-object (kinda "index-only") - Done (Tony)

  • Index-only extension properties (indexOnly=true) - these are probably stored correctly and can be filtered on, but they are not added to the obtained object (getObject).

    • They should be included only when asked for specifically, e.g. for shadow attribute ObjectSelector.path=c:attributes and GetOperationOptions.retrieve=INCLUDE. - Done (Tony)

  • Support for various get/search/modify options

  • Support for

    • resolveNames - Done

    • raw - Done

  • Proper modifyObjectDynamically() implementation - Transactional support introduced

  • Audit service (original functionality without new iterative search)

  • Audit service - iterative search, single order supported, just like in main repo.

  • Multi-node safety for UriCache and ExtItemCache

Modify Efficiency issues

Modify efficiency for distributed full-object (AccessCertificationCampaign vs its case):

  • ADD modification for Case container of AccCertCampaign should not fetch other cases. Update of full object (but without cases), version, etc. works as usual.

  • DELETE modification for Case container of AccCertCampaign should not fetch any cases, only delete the case row (and update main full object).

  • REPLACE modification for case does not need to read any cases, just delete all the rows.

  • Modification going inside any Case should fetch only the needed case before applying the modification.

This mechanism can be reused if full-object for assignments is split from object’s full-object in the future (not sure about the correctness of the column name fullObject).

Index-only attributes/extensions modify efficiency:

  • REPLACE modification does not require any fetch, it simply obliterates the previous value in the DB.

  • ADD/DELETE modification currently requires the fetch of the index-only property, which can be achieved by selective fetch of ext→'id-of-index-only-property'. Modification is then applied to prism object as usual. Value in the DB is overwritten by the new complete value for the property (like for REPLACE).

    • In the future, ideally, this would not need any fetch access (except for reindex) and issue fine-tuned SQL operations. This may also require more flexible storage options for extensions/attributes, e.g. promoting the index-only attribute to a separate array column or separate table even.

Adding object

TODO docs

How to add a new field/column?

Example - let’s add operationalState column for m_node table.

  • Let’s start in the SQL schema, locate the table m_node and add operationalState column to it. Consider logical order of the columns - even though it’s not relevant for upgraded DB, it is relevant for future readability. Notice, how columns for items from subcontainers are grouped in existing tables. Also, we name the columns "Java style", although Postgres doesn’t care aboute casing. The same name will be used in so called "M-class" (MNode in our example). Sometimes the name contains container name prefix, this is not used for metadata, activation and similar common containers - even these are still pure camel-case names.

  • What type is it? Simple things are easy (TEXT, INTEGER), some items require multiple columns (references, poly-strings), follow the patterns used previously for such cases. Our operationalState is enum type - great! Read the next bullet how to do that. Some multi-value types can be represented by arrays like TEXT[] or JSONB columns. If we’re talking about whole new multi-value container, whoa-whoa…​ that’s out of scope of this humble instruction sheet!

  • If the type is previously unused enum, we’ve got just a bit more work to do. First we need to add CREATE TYPE for it to the schema. Find the section with custom types, read its intro comment and add the enum specification. Very likely it will be "schema enum", not specialized repository internal enum - that’s typical. Use the class name for the custom type to make things as obvious as possible, so it’s NodeOperationalStateType for our case. Copy/paste the values to avoid any mistake…​ other than copy/paste error, that is. Don’t forget to mention this enum in the SqaleRepoContext constructor. Alphabetic order, please!

  • Let’s change the "M-class", MNode in our example. Simply add public field for the column, like public NodeOperationalStateType operationalState. Keep the order consistent with the table. BTW, "M" does not stand for "mapping", we will see mapping class later.

  • Now it’s great time to update SqaleRepoAddDeleteObjectTest! Find the method testing mapping for this class and add value and assert for the new item. Feel free to add the method if this entity/object type is not yet tested - just like I did. Run the test, it should fail for the new attribute, which is a good sign.

  • We need to declare the new column in a "Q-class" which extends from Querydsl type hierarchy. Technically it’s mapping for Querydsl, but it’s still not "our" mapping for midPoint (soon, I promise!). For our example, it’s QNode - and there are two sections:

    • First, static column metadata, find good example from other class if necessary. In our case, I’ll add:

      public static final ColumnMetadata OPERATIONAL_STATE =
        ColumnMetadata.named("operationalState").ofType(Types.OTHER);

      For enum types we use Types.OTHER, again, see existing examples from other classes for your type.

    • Next, we add non-static column (or attribute) path:

      public final EnumPath<NodeOperationalStateType> operationalState =
        createEnum("operationalState", NodeOperationalStateType.class, OPERATIONAL_STATE);

      The name of the column (the same like the name of the field in M-class) appears twice here, because we want the same name again for Q-class paths. Previously specified column metadata are used. As before, see examples from other Q-classes to use the right create* method and path type.

    Keep the order consistent with SQL and M-class in both sections. Good, now Querydsl knows what to do with our field in the M-class.

  • Now it’s time to add the insert code. Finally, we’re getting to the "mapping class" - QNodeMapping in our case. Locate toRowObjectWithoutFullObject and add the following line there:

    row.operationalState = node.getOperationalState();

    As always, follow the order from SQL and M-class. The code for enum and many other types is as trivial as shown above, but there is great support for refs, poly-strings and many more too - just find the examples in other Q-Mapping classes.

  • Nearly there! We still need one more thing to support searching and modifications too. Go to the constructor of the mapping class (QNodeMapping for us) and add (respecting the right order again, of course!):

    addItemMapping(F_OPERATIONAL_STATE, enumMapper(q -> q.operationalState));

    I mean, seriously, can it be any more auto-magical than this? It is possible to write test to this as well, but honestly, we don’t bother when adding a new mapping for well-working type. Just be sure to use the right item name (F_OPERATIONAL_STATE imported statically from NodeType), proper mapper method (enumMapper) and proper path (q.operationalState, which is that final non-static field we added on the Q-class).

  • And SQL alter script, of course! OK, the alter script is not yet in place (this is written before 4.4 is out), but find it and add the modification there using the newest fashionable method available. Hopefully, it’s something utilizing apply_change and - for your sake - same examples are already available.

To see the whole success story, check this commit (add-test was committed separately and linked previously).

Collation (ordering)

Depending on the chosen collation ordering of text fields can be case-insensitive or not. To see it quickly, one can use select like this:

select * from (values ('a'), ('B'), ('Z'), ('x'), ('Č'),
    ('@'), ('_'), ('%'), ('^'), ('5'), ('47'), ('持')) as t(text)
  order by text -- collation "C"

With additional collation "x" one can see various results for other collations. If en_US.UTF-8 is used during database creation (recommended by midPoint docs) the ordering of the select above should be case-insensitive. Collation used by default for each database can be determined like so:

select datname, datcollate from pg_database;
-- returns among others: midpoint,en_US.UTF-8

Funny enough, this default collation name may not be a valid name for COLLATION "collation-name" clause but for en_US.UTF-8 it seems to act the same as collate "en_US.utf8" (or en_US). The list of valid collation names can be obtained from pg_collation table:

select * from pg_collation;

As explained in the collation documentation, there is a default collation (collprovider = 'd'), libc based collations (d) and ICU collations (i) if compiled with it (PG 13 on Ubuntu contains these too).

It is possible to create tables or single columns with non-default collation. To list columns with different collation one can use this select (with or without table_schema filter, select based on this answer):

select table_schema, table_name, column_name, collation_name
  from information_schema.columns
  where collation_name is not null and table_schema = 'public'
order by table_schema, table_name, ordinal_position;
If query is used with custom COLLATE regularly the table should be indexed accordingly.

If case-sensitive locale is used when case-insensitive behavior is expected it may result in unexpected results. The old repository uses VARCHAR type for OIDs and bucket defining queries for extreme boundaries accidentally caught more than was intended. E.g. oid > 'FF' can also include all the OIDs starting with lower-case A through F, if interpreted case-sensitively - which it is with collate "C" for example (but not with en_US at least not for PG 12 or 13). See MID-6468 for possible problem manifestation.

Creating DB with other collation

As described in the notes for CREATE DATABASE one may need to specify template0 as a template for database creation with different collation. Adding collation support for other languages to the operating system and then adding it to PG is beyond this page, but is described in the docs.