MidScale: Prism Design

Last modified 12 Mar 2021 10:22 +01:00
This document is draft of preliminary notes for Prism Design meeting

Participants

  • Radovan Semancik

  • Katarina Bolemant

  • Pavol Mederly

  • Anton Tkacik

  • Igor Farinic

  • Kamil Jires

  • Richard Richter

Motivation and Requirements

Motivation

Prism is use by every component of midPoint and provides basic infrastructure for data representation and manipulation. The performance, scalability and features of Prism directly affects scalability of midPoint as a whole.

Primary Problems

We are trying to solve following problems:

  • Thread safety of prism code. We need the code to work correctly even with high volume of concurrent requests. E.g. systematically avoid race conditions. Allow objects to be safely cached, boosting performance. Potential problems:

    • JAXB code (complex properties).

    • DOM, inside partially-parsed XNodes.

    • GregorianCalendar, PolyString and similar primitive properties implemented by complex Java classes.

    • OperationResult?

  • Performance. Do not waste CPU or memory, avoid high complexity, e.g. we know we have some O(n2) code there. Prism is used everywhere, it is used often. Even small performance problems are likely to significantly amplify. Likely problems:

    • O(n2) code, e.g. checking for value existence before adding new value

    • schema clone & modify, e.g. application of security+obj.template+archetypes for GUI

  • API stability. Prism is used everywhere in midPoint, it is used in midPoint client, it is used by midPoint expressions and customizations. Every incompatible change in midPoint API is likely to cause cascading changes at many places. Lot of customers may be affected. Note: Solution to this problem is likely to appear as a side-effect during Prism improvement.

Use cases

  • Prism provides schema for data, which is used for data (un)marshalling, data processing and validation

  • Prism provides data marshalling / unmarshalling capabilities based on schema

  • Prism serves as primary in-memory data structures for representing processed data

  • Prism provides basic primitives for data processing and trasnformation such as data modification, data equivalency

  • Prism as base for client and midPoint

Assumptions

  • For foreseeable future Prism is in midPoint context (used in midPoint client, not used by any third-party software)

  • Axiom based schema - eventually, but we need to live with XSD for now

Ideal State

Requirements

  • Profiling - open question for now

Testing Environment Requirements

  • 1-2 source systems

    • PostgreSQL

  • 10k-100k records

  • 4x targets systems

    • 2x OpenLDAP

    • 2x Database

  • Large extension schema (counts by Katarina)

  • Archetypes 3x

  • Assigments per user: 10-100 (average 40)

    • 20% expired

    • conditions in aassigments

    • 2 midPoint node: 16GB RAM

    • DB server: 16GB RAM

Performance and scalability goals

Do we have some performance or scalability targets? Do we know how big a system do we want to support? How many users, how many requests per second, special usage patterns (bursts), anything else?

Ideas and Concepts

When thinking about the use cases, do not limit your thoughts to just that one specific use case. Think about generic mechanisms, broader principles. Focus on concepts and ideas, rather than algorithms. Design mechanism that can handle your use case, and thousands of similar use cases as well. Design generic re-usable mechanisms.

Make sure the new mechanisms work well with existing mechanisms. We are looking for synergies. We want to combine mechanisms together into more flexible and more powerful solutions.

Thread Safety

  • Thread safety

    • Thread safety issues & immutability

    • Pure JAXB beans - immutable, replace Gregorian Calendar, ordered lists

    • XNodes with DOM (locking on DOM document)

    • OperationResult

    • ReadOnly flag in midPoint APIs, we want it do be true by default. But how to migrate?

    • Schema

Performance

  • Big Items

    • Item with lot of values

      • Duplication checks -HashSet??? - requires Immutability of nested values

      • Delta application (see Prism Deltas)

  • Application of Authorization & Schema

    • Smart schema builders

      • change label

      • change flags

        • removal of items

        • Object caching

  • Parsing & Serialization

Memory

  • Application of Authorization & Schema

    • Smart schema builders

    • change label

    • change flags

    • removal of items

    • Object caching

  • Parsing & Serialization

Planning

  1. Performance measurements - Baselines

  2. Performance measurements - Weaknesses

  3. Thread safety analysis

    • Prism

    • Util

    • Repository

    • Repository Cache

    • Task

  4. Planning based on performance measurements / thread safety analysis

  5. Thread-safety Improvements

  6. Performance Improvements

Implementation

Testing & Reporting

Selection & integration of of Unit Test benchmark solution which provides measurements & record collection.

Responsible: Richard & Tony

Proposals

Metrics Runtime

Consider to unify metrics collections at various places in MidPoint (if there are differences and it is possible).

Consider introduction of "global" metrics collection mechanisms.

Black-box tests

Blackbox tests should be based on MIDPOINT 4.2 and developed on branch there in order to measure baselines. Tests then will be merged to master to measure actual new performance.

Branch will be perf-test.

TODO: feasibility

Performance and Scalability Considerations

TODO

Testability Considerations

Make sure that the functionality can be tested. Think about the testing process. Can this be teste by the usual mechanisms that we have? Will we need some special environment or setup?

Diagnostics and Visibility

  • Java profiling: for use on developer’s machines

  • MidPoint tracing capabilities: for use in (clustered) testing environment

Security Considerations

TODO

Open Questions

We need to further discuss:

  • Prism complex properties, how the will be seen by Prism schema code, code generation approach.

  • Thread safety approach, design of mechanisms (immutability? freezing?, etc.)

  • Application of schema for GUI/REST (security, obj.template, etc.)

  • Item equivalence

  • Vision for Prism5/midPoint5, future of Axiom; to guide the design of Prism improvements

Existing Measurements & Problems

Prism Deltas

  • Item Delta application with n modified values to target with m original values has following operation counts (worst case, as of 4.2):

    • n copies of new value

    • m*n + n*(n-1)/2 comparisons

schema/TestDeltaPerfComparison

Measured tests resulsts confirms linear time growth of applying single-item delta to multi-valued container.

monitor

assigments

count

total(us)

avg(us)

min(us)

max(us)

note

delta.clone

10

20000

123630

6

3

4295

Cloning of structure

delta.clone

100

20000

438725

21

16

4557

Cloning of structure

delta.clone

1000

20000

3490439

174

157

4480

Cloning of structure

delta.clone

10000

20000

50605384

2530

2187

11569

Cloning of structure

delta.MINUS.apply

10

10000

77894

7

6

157

Application of delta

delta.MINUS.apply

100

10000

507189

50

42

3801

Application of delta

delta.MINUS.apply

1000

10000

5625361

562

457

2676

Application of delta

delta.MINUS.apply

10000

10000

59016484

5901

5275

17396

Application of delta

delta.PLUS.apply

10

10000

225786

22

8

1850

Application of delta

delta.PLUS.apply

100

10000

544184

54

43

225

Application of delta

delta.PLUS.apply

1000

10000

5441953

544

454

4989

Application of delta

delta.PLUS.apply

10000

10000

56833512

5683

5136

15418

Application of delta