ALTER TYPE ObjectType ADD VALUE IF NOT EXISTS 'SIMULATION_RESULT' AFTER 'SIMULATION_RESULT'; CREATE TABLE m_simulation_result ( oid UUID NOT NULL PRIMARY KEY REFERENCES m_object_oid(oid), objectType ObjectType GENERATED ALWAYS AS ('SIMULATION_RESULT') STORED CHECK (objectType = 'SIMULATION_RESULT') ... ) INHERITS (m_object) ;
Simulations - Repository and Model
Since 4.7This functionality is available since version 4.7.
- Base Objects and Containers
- Repository API
- Storage Strategy
Base Objects and Containers
Object storing base information about simulation and simulation result full object (except deltas, and results).
Object is stored in primary midPoint database and also simulations database, it has full-object fields and all properties of normal midPoint object.
Simulation Result objects contains multiple containers for:
Metric - Named statistics / counters - precomputed counters based on custom queries, containing user-defined metrics to capture about simulations (account added, disabled, etc)
Processed Objects -
Simulation Result - Metrics
Metrics are user (system) defined counters using filter API and/or other mechanisms (outside of scope of Repository storage), which captures properties of simulation (e.g. objects scanned, accounts disabled.)
Basic metric have following properties:
name, tag used to identify metric. It is logical name, which is also used for tagging processed objects, if they matched query associated with metric
- Display Name
Human readable name of metric, such as "Accounts Disabled", to provide users insight into meaning of metric.
- Query / Policy
Query or policy (if applicable) which resulted to
- Count Objects "scanned"?
Count of all objects processed during evaluation of the metric
- Objects matched count
Count of all objects, which matched query or policy associated with metric
Simulation Result - Processed Objects
Container used to store simulation-related information about processed
Object identifier of processed object
Name of processed objects
State of object during simulation:
- Tags / Metrics
List of metrics, which object matches
Details are stored in simulation-only database, details such as before / after state are stored only for added/modified/ object
Deltas - store
before state - full object representation of object
after state (simulations-only, optional)
ALTER TYPE ContainerType ADD VALUE IF NOT EXISTS 'OBJECT_RESULT' AFTER 'FOCUS_IDENTITY'; CREATE TABLE m_simulation_processed_object ( ownerOid UUID NOT NULL REFERENCES m_object_oid(oid) ON DELETE CASCADE, ... ) PARTITION BY LIST(ownerOid);
This table represents Prism
Storing simulation result in one final call is not reasonable, the support for partial additions of object specific results is needed.
Storing simulations information and counters
Retrieving simulation information and counters
Storing simulation results - deltas and objects
Browsing precomputed query results (added objects, modified objects)
Ad-hoc search and analysis on simulated data
Current repository APIs
Use case 1. (storing information and counters) and 2. (retrieveing) is easily supported by current native repository implementation by adding support for storage of new object type for SimulationRun.
Use case 4. (deltas / object results) is easily and effectively achievable using
searchContainers if these containers would be stored in separate table (see SimulationResult)
Use case 3. - adding deltas / containers - this may require special implementation for Simulation Result probably - we want to avoid generating whole object on addition of new container (additional result should be only inserted to table).
Use case 5. - This would require
Behaviour in normal repository
SimulationResult object is always returned from repository with:
list of processed objects - no values, always
this data could be purged to save space, dataset could be too huge to be useful in-memory (10k+ containers)
object deltas - no values, always
The possibility to have actual data spread over two databases (unmodified in production, created and modified in simulations database) is is necessary to have specialized component for simulations, which will orchestrate writing / reading from these two logical databases.
The API for Simulations would need following methods:
variant of addObject, fallbacks mostly to similar implementation as in native repository
new method, bit similar to
audit, adds container to existing simulation result, implementation does not need to recompute or fetch full
SimulationResultobject. Stores light-weight version into production database and full version into simulations database.
performs search on
SimulationResultobjects, search does not support searching on processed objects.
performs search on
ProcessedObjectcontainers, this could support also search on before/after state, if present. The search is first performed on simulations-only database for modified objects and is "joined" with results from production database.
There are 2 possible storage strategies used during deployment:
Option A: Use same database for production data and simulation results
Option B: Use separate database for simulation results
Option A: Only in midPoint database (optional)
Simpler deployment, testing
Slowdown of production database, simulations partitioning will make midPoint database less readable for administrators.
Option B: Mixed (part in midPoint database, part in Simulations-only database)
One of discussed solutions was to store basic simulation results and counter tags in production database, while storing deltas and after states in separate database.
Smaller slowdown of production database, since minimal
Simulations APIfacade - when listing all objects from simulation, unmodified objects may be fetched from midPoint database, while modified from Simulations database
simulations partitioning will make midPoint database less readable for administrators.
Option C: Simulations-only database (not implemented)
All simulation related data are stored in separate database.
There is expectation, that object results should be dropped fast - there is proposal to have partition for each simulation result.
The partition needs to be created before any results are stored, these can be done by issuing CREATE TABLE … PARTITION OF command. This requires midPoint to have privileges to edit schema.
CREATE TABLE m_simulation_result_processed_object_4e485c35_0f6a_4d95_a4b6_c87530fe PARTITION OF m_simulation_result_processed_object FOR VALUES IN ('4e485c35-0f6a-4d95-a4b6-c87530fe');
This type of partitioning allows us to purge detailed data really fast by
|Unfortunately automatic partition creation is not present in bare PostgreSQL.|
If simulations are run often, and results are not purged from database will contain lot of tables - one table per simulation run.
Configuration of partitioning
Partitioning configuration (if simulation uses partition for processed objects) is stored in simulation result.
This allows for catch-all partition for simple simulations / startup
Repository based on partitioning configuration of particular simulation result uses DELETE or (CREATE … PARTITION OF / DROP TABLE) to purge Processed Objects table.
Configuration if simulations should be partitioned:
Global partitioning configuration is present in System Configuration
Simulation is responsible for reading system configuration and using partitioning configuration when creating new SimulationResult objects.
Operational midPoint tables - these tables are used during normal operations and stores configuration, shadow and focus data
Non-operational simulation tables
Make sure PostgreSQL enums are alphanumerically ordered.