public static final ColumnMetadata OPERATIONAL_STATE =
ColumnMetadata.named("operationalState").ofType(Types.OTHER);
Repository Implementation
This page is part of MidScale project documentation and was freezed at the end of the project. See Design and Implementation and DB maintenance for current pages. |
Repository responsibilities
These are based on RepositoryService interface and there should be no difference compared to the legacy repository. Each point contains design/implementation notes.
-
Repository stores midPoint objects (subtypes of
ObjectType
), assigning their OID if missing. For thism_object
table withoid
(UUID) andfullObject
(BYTEA) columns would be enough. However, this is not acceptable forsearchObjects
performance see note there. -
Repository assigns any missing container IDs (CID) for object’s containers. These IDs are unique inside a single object and assigned loosely sequentially. For this
m_object.cid_seq
is stored in the database and it contains the next available CID. IDs are assigned only to multi-valued containers, e.g. object’s metadata does not need any CID. -
Repository allows searching for objects and selected containers with reasonable performance. This is why objects stored as documents (
fullObject
) is not enough. Various types are stored in specific sub-tables instead with values extracted to separate columns. There are other related tables, typically for containers and other multi-value properties. Not all values are by default searchable/indexable in this way and it must be possible to specify what other properties are indexed. These can be properties from the default schema but also from schema extensions. -
Each object is a separate "aggregate" and any inter-object references are soft, not constraints on the database level. For example, multi-value container
roleMembershipRef
is stored inm_ref_role_membership
table which has a foreign key to the common OID poolm_object_oid
(this is necessary for any FK pointing to object OID if multiple types are possible, FK tom_object
does not work this way). But thetargetOid
is not a FK as it points to another object.
TODO
Notes, non-issues
Retry mechanism (if/where necessary) may not be necessary if tests don’t show that.
SELECT … FOR UPDATE
seems to do its work just fine.
Minor, tentative, questinable, don’t implement while it’s in this list:
-
Dereferencing (
@
) for extension refs stored in JSONB. -
Grouping is not supported and the semantics for SQL without ORM is unclear. This is not used by midPoint itself and there are no tests for it.
-
Arbitrary ordering for iterative search - currently only one path is supported. Fixes for
NULL
values are needed - TODO in the code. Multi-path ordering throws - this can probably wait a bit.
Closed TODOs
-
FullText filter (textInfo), including storage of the info
-
Modify object with reindex; implemented as full delete/add cycle, very reliable and potentially more efficient too.
-
Right-hand path support, e.g.
column = otherColumn
, including other comparison operations. Minimal needed support implemented,..
not supported, but polystrings are. -
Support for various get/search/modify options
-
RepoModifyOptions.useNoFetchExtensionValuesInsertion
anduseNoFetchExtensionValuesDeletion
may be obsolete and ignored by the new repository. -
GetOperationOptions.iterationMethod
is ignored by the new repository, don’t implement.
-
-
JPEG out of full-object (kinda "index-only") - Done
-
Index-only extension properties (
indexOnly=true
) - these are probably stored correctly and can be filtered on, but they are not added to the obtained object (getObject
).-
They should be included only when asked for specifically, e.g. for shadow attribute
ObjectSelector.path
=c:attributes
andGetOperationOptions.retrieve
=INCLUDE
. - Done
-
-
Support for various get/search/modify options
-
Support for
resolveNames
andraw
done -
Proper
modifyObjectDynamically()
implementation - Transactional support introduced -
Audit service (original functionality without new iterative search)
-
Audit service - iterative search, single order supported, just like in main repo.
-
Multi-node safety for
UriCache
andExtItemCache
-
Query playground:
executeQueryDiagnostics()
Modify Efficiency issues
Modify efficiency for distributed full-object (AccessCertificationCampaign
vs its case):
-
ADD modification for Case container of AccCertCampaign should not fetch other cases. Update of full object (but without cases), version, etc. works as usual.
-
DELETE modification for Case container of AccCertCampaign should not fetch any cases, only delete the case row (and update main full object).
-
REPLACE modification for
case
does not need to read any cases, just delete all the rows. -
Modification going inside any Case should fetch only the needed case before applying the modification.
This mechanism can be reused if full-object for assignments is split from object’s full-object
in the future (not sure about the correctness of the column name fullObject
).
Index-only attributes/extensions modify efficiency:
-
REPLACE modification does not require any fetch, it simply obliterates the previous value in the DB.
-
ADD/DELETE modification currently requires the fetch of the index-only property, which can be achieved by selective fetch of
ext→'id-of-index-only-property'
. Modification is then applied to prism object as usual. Value in the DB is overwritten by the new complete value for the property (like for REPLACE).-
In the future, ideally, this would not need any fetch access (except for
reindex
) and issue fine-tuned SQL operations. This may also require more flexible storage options for extensions/attributes, e.g. promoting the index-only attribute to a separate array column or separate table even.
-
Developer notes
How to add a new field/column?
Example - let’s add operationalState
column for m_node
table.
-
Let’s start in the SQL schema, locate the table
m_node
and addoperationalState
column to it. Consider logical order of the columns - even though it’s not relevant for upgraded DB, it is relevant for future readability. Notice, how columns for items from subcontainers are grouped in existing tables. Also, we name the columns "Java style", although Postgres doesn’t care aboute casing. The same name will be used in so called "M-class" (MNode
in our example). Sometimes the name contains container name prefix, this is not used for metadata, activation and similar common containers - even these are still pure camel-case names. -
What type is it? Simple things are easy (
TEXT
,INTEGER
), some items require multiple columns (references, poly-strings), follow the patterns used previously for such cases. OuroperationalState
is enum type - great! Read the next bullet how to do that. Some multi-value types can be represented by arrays likeTEXT[]
orJSONB
columns. If we’re talking about whole new multi-value container, whoa-whoa… that’s out of scope of this humble instruction sheet! -
If the type is previously unused enum, we’ve got just a bit more work to do. First we need to add
CREATE TYPE
for it to the schema. Find the section with custom types, read its intro comment and add the enum specification. Very likely it will be "schema enum", not specialized repository internal enum - that’s typical. Use the class name for the custom type to make things as obvious as possible, so it’sNodeOperationalStateType
for our case. Copy/paste the values to avoid any mistake… other than copy/paste error, that is. Don’t forget to mention this enum in theSqaleRepoContext
constructor. Alphabetic order, please! -
Let’s change the "M-class",
MNode
in our example. Simply add public field for the column, likepublic NodeOperationalStateType operationalState
. Keep the order consistent with the table. BTW, "M" does not stand for "mapping", we will see mapping class later. -
Now it’s great time to update
SqaleRepoAddDeleteObjectTest
! Find the method testing mapping for this class and add value and assert for the new item. Feel free to add the method if this entity/object type is not yet tested - just like I did. Run the test, it should fail for the new attribute, which is a good sign. -
We need to declare the new column in a "Q-class" which extends from Querydsl type hierarchy. Technically it’s mapping for Querydsl, but it’s still not "our" mapping for midPoint (soon, I promise!). For our example, it’s
QNode
- and there are two sections:-
First, static column metadata, find good example from other class if necessary. In our case, I’ll add:
For enum types we use
Types.OTHER
, again, see existing examples from other classes for your type. -
Next, we add non-static column (or attribute) path:
public final EnumPath<NodeOperationalStateType> operationalState = createEnum("operationalState", NodeOperationalStateType.class, OPERATIONAL_STATE);
The name of the column (the same like the name of the field in M-class) appears twice here, because we want the same name again for Q-class paths. Previously specified column metadata are used. As before, see examples from other Q-classes to use the right
create*
method and path type.
Keep the order consistent with SQL and M-class in both sections. Good, now Querydsl knows what to do with our field in the M-class.
-
-
Now it’s time to add the insert code. Finally, we’re getting to the "mapping class" -
QNodeMapping
in our case. LocatetoRowObjectWithoutFullObject
and add the following line there:row.operationalState = node.getOperationalState();
As always, follow the order from SQL and M-class. The code for enum and many other types is as trivial as shown above, but there is great support for refs, poly-strings and many more too - just find the examples in other Q-Mapping classes.
-
Nearly there! We still need one more thing to support searching and modifications too. Go to the constructor of the mapping class (
QNodeMapping
for us) and add (respecting the right order again, of course!):addItemMapping(F_OPERATIONAL_STATE, enumMapper(q -> q.operationalState));
I mean, seriously, can it be any more auto-magical than this? It is possible to write test to this as well, but honestly, we don’t bother when adding a new mapping for well-working type. Just be sure to use the right item name (
F_OPERATIONAL_STATE
imported statically fromNodeType
), proper mapper method (enumMapper
) and proper path (q.operationalState
, which is that final non-static field we added on the Q-class). -
And SQL alter script, of course! OK, the alter script is not yet in place (this is written before 4.4 is out), but find it and add the modification there using the newest fashionable method available. Hopefully, it’s something utilizing
apply_change
and - for your sake - same examples are already available.
To see the whole success story, check this commit (add-test was committed separately and linked previously).
Collation (ordering)
Depending on the chosen collation ordering of text fields can be case-insensitive or not. To see it quickly, one can use select like this:
select * from (values ('a'), ('B'), ('Z'), ('x'), ('Č'),
('@'), ('_'), ('%'), ('^'), ('5'), ('47'), ('持')) as t(text)
order by text -- collation "C"
With additional collation "x"
one can see various results for other collations.
If en_US.UTF-8
is used during database creation (recommended by midPoint docs) the ordering
of the select above should be case-insensitive.
Collation used by default for each database can be determined like so:
select datname, datcollate from pg_database;
-- returns among others: midpoint,en_US.UTF-8
Funny enough, this default collation name may not be a valid name for COLLATION "collation-name"
clause but for en_US.UTF-8
it seems to act the same as collate "en_US.utf8"
(or en_US
).
The list of valid collation names can be obtained from pg_collation
table:
select * from pg_collation;
As explained in the collation documentation,
there is a default collation (collprovider = 'd'
), libc based collations (d
) and ICU collations
(i
) if compiled with it (PG 13 on Ubuntu contains these too).
ICU collations can’t be used as database defaults. |
It is possible to create tables or single columns with non-default collation.
To list columns with different collation one can use this select (with or without table_schema
filter, select based on this answer):
select table_schema, table_name, column_name, collation_name
from information_schema.columns
where collation_name is not null and table_schema = 'public'
order by table_schema, table_name, ordinal_position;
If query is used with custom COLLATE regularly the table should be indexed accordingly.
|
If case-sensitive locale is used when case-insensitive behavior is expected it may result in
unexpected results.
The old repository uses |
Creating DB with other collation
As described in the notes for CREATE DATABASE
one may need to specify template0
as a template for database creation with different collation.
Adding collation support for other languages to the operating system and then adding it to PG
is beyond this page, but is described in the docs.