Navigation Tree

midPilot Information Collection

Last modified 05 Nov 2025 12:40 +01:00

As part of our ongoing effort to enhance midPoint, we’re focusing on simplifying configuration and improving the GUI/UX, especially for declarative connector framework development and resource configuration using advanced algorithms and AI support. To optimize these new features, we would like to collect and analyze information from real-world midPoint deployments.

Gathering Information

We would like to collect following types of information:

Configuration fragments
Statistical characteristics of selected data

Configuration Fragments

What

ResourceType (excluding connector configuration) + names and availability of ConnectorType objects
ObjectTemplateType, PolicyType, ArchetypeType, FunctionLibraryType
Parts of SystemConfigurationType (defaultObjectPolicyConfiguration, modelHooks, correlation)
Extension schemas

Why

To identify missing functionality in midPoint, mainly by analyzing long and/or repeated scripts in configuration objects
To optimize suggestions (e.g. for synchronization reactions), making sure the most common configuration options are suggested
To validate our solution during development and testing phases, to make sure that midPoint is able to seamlessly address real-world scenarios
Provided configurations can be used as an inspiration to generate synthetic data sets, used to test our algorithms, heuristics, and LLM prompts
Simply speaking, we would like to use your data to make sure that midPoint algorithms and recommendations will work for you, on your specific data and for your specific cases.

How

Choose one of the following options:
1. Ninja tool only:
  - Use the ninja tool to export midPoint configuration:
    
    command: Export Configuration (includes ResourceType objects with schemas and capabilities)
2. Git copy + Ninja tool:
  - copy files from your git repository
  - use the ninja tool to export only resource schemas and capabilities:
    
    command: Export Resource Schema and Capabilities (only ResourceType objects limited to schemas and capabilities)
also copy extension schema(s) manually
Please anonymize any sensitive data – no company secrets, no personal information, please
Please contact us at midpilot@evolveum.com and we will provide instructions on how to deliver the exported data securely

Statistical Characteristics of Selected Data

What

Shadow statistics: number of objects per resource, object type, and synchronization situation (linked, unlinked, etc.)
Focus statistics: number of objects per type and statistics related to individual properties
- % of objects where the property value is missing
- % of objects where the property has multiple values
- Distribution of values (without disclosing specific values, only the relative number of their occurrences)
- Ratio of unique values to the number of objects

Why

To generate synthetic data to test our algorithms, heuristics, and LLM prompts

How

Use the ninja tool:
- command: Export Shadow Statistics
- command: Export Focus Statistics
No need for anonymization, as the data carry almost no information that can be misused, except for property names
Even though the object counts are blurred we distinguish only the scale: 0, 1-99, 100-999, 1000-9999, etc.
Please contact us at midpilot@evolveum.com and we will provide instructions on how to deliver the exported data securely

Export example (fragments of)

<focus>
  <type>org</type>
  <count>99</count>
  <property>
    <path>identifier</path>
    <multiValuedRatio>0.0</multiValuedRatio>
    <missingRatio>1.0</missingRatio>
    <cardinality>0.0</cardinality>
  </property>
  <property>
    <path>extension/building</path>
    <multiValuedRatio>0.0</multiValuedRatio>
    <missingRatio>1.0</missingRatio>
    <cardinality>0.0</cardinality>
  </property>
  <property>
    <path>displayOrder</path>
    <multiValuedRatio>0.0</multiValuedRatio>
    <missingRatio>0.0</missingRatio>
    <distribution>
      <value>0.33333334</value>
      <value>0.33333334</value>
      <value>0.33333334</value>
    </distribution>
  </property>
</focus>

<shadowStatistics>
  <resourceRef oid="4809f037-d8a7-4daa-9f96-bcead9b534ef"/>
  <objectClass>ri:inetOrgPerson</objectClass>
  <kind>ACCOUNT</kind>
  <intent>external</intent>
  <synchronizationSituation>LINKED</synchronizationSituation>
  <count>9999</count>
</shadowStatistics>
<shadowStatistics>
  <resourceRef oid="9c5f9092-910f-437f-9898-ed578d1c2cf3"/>
  <objectClass>ri:User</objectClass>
  <kind>ACCOUNT</kind>
  <intent>default</intent>
  <synchronizationSituation>LINKED</synchronizationSituation>
  <count>999</count>
</shadowStatistics>

Our Commitment to Information Security, Data Protection and Privacy

Secure handling of the provided information
- Access to provided information is strictly limited to a few individuals (on "need to know" basis). These individuals will be responsible for analyzing the data, making sure the data are properly protected, and that no specific detail of the data leaks into generated synthetic data.
No third-party sharing or processing
- We will never share the data with any third party.
- We will never process the data using any cloud system or service. All processing of the provided data will be done on infrastructure completely controlled by Evolveum.
No AI training based on provided information
- The data will never be used to train any LLM or any other AI system.
- No permanent record of any part of your data will remain in midPoint or any associated services.
Limited information use: We will use your data only for following purposes:
- Identification of missing midPoint features and validation of planned features
- Statistical analysis of configuration elements used in real life to tune suggestions, e.g., for sync reactions
- Internal validation of our solution during development and testing phases
- Drawing inspiration for creating synthetic configurations similar to real-life ones
- Creation of synthetic data sets with sizes and characteristics similar to real life
Retention period of the original information is 12 months at most:
- Removed automatically after that time or earlier upon your request

We appreciate your cooperation and support in enhancing midPoint’s capabilities. Your insights are invaluable for improving midPoint.

Explanations and Procedures

Synthetic data sets contain artificially-generated data, that does not describe any specific customer, deployment or configuration. Synthetic data sets are automatically generated using an algorithm. As these data sets have no connection with specific deployment or customer, they can be safely used in automated testing scenarios and pipelines, making sure that midPoint works properly now, as well as for any future version of midPoint.

The algorithm that generates synthetic data is inspired by real-world scenarios. However, no specific parts of the real-world scenario is directly present in the synthetic data sets. Synthetic data sets may have similar statistical properties as the real-world data, such as similar number of users, similar ratio of the numbers of roles to users, similar statistical distribution of number of assignments in users, similar probabilities of synchronization situations and so. However, the specific details will be different. Synthetic data will usually combine characteristics of several real-world configurations into a single data set, further generalizing the patterns and characteristics.

Synthetic data are never directly derived from the provided real-world data, nor any kind of LLM or GenAI is used for that purpose. The real-world data are always processed by a human expert, who manually creates an algorithm to generate synthetic data sets, making sure that no deployment-specific details can leak into the synthetic data set.

Was this page helpful?

YES NO

Thanks for your feedback