midPilot Information Collection

Last modified 05 Nov 2025 12:40 +01:00

As part of our ongoing effort to enhance midPoint, we’re focusing on simplifying configuration and improving the GUI/UX, especially for declarative connector framework development and resource configuration using advanced algorithms and AI support. To optimize these new features, we would like to collect and analyze information from real-world midPoint deployments.

Gathering Information

We would like to collect following types of information:

  • Configuration fragments

  • Statistical characteristics of selected data

Configuration Fragments

What

  • ResourceType (excluding connector configuration) + names and availability of ConnectorType objects

  • ObjectTemplateType, PolicyType, ArchetypeType, FunctionLibraryType

  • Parts of SystemConfigurationType (defaultObjectPolicyConfiguration, modelHooks, correlation)

  • Extension schemas

Why

  • To identify missing functionality in midPoint, mainly by analyzing long and/or repeated scripts in configuration objects

  • To optimize suggestions (e.g. for synchronization reactions), making sure the most common configuration options are suggested

  • To validate our solution during development and testing phases, to make sure that midPoint is able to seamlessly address real-world scenarios

  • Provided configurations can be used as an inspiration to generate synthetic data sets, used to test our algorithms, heuristics, and LLM prompts

  • Simply speaking, we would like to use your data to make sure that midPoint algorithms and recommendations will work for you, on your specific data and for your specific cases.

How

  • Choose one of the following options:

    1. Ninja tool only:

      • Use the ninja tool to export midPoint configuration:

    2. Git copy + Ninja tool:

  • also copy extension schema(s) manually

  • Please anonymize any sensitive data – no company secrets, no personal information, please

  • Please contact us at midpilot@evolveum.com and we will provide instructions on how to deliver the exported data securely

Statistical Characteristics of Selected Data

What

  • Shadow statistics: number of objects per resource, object type, and synchronization situation (linked, unlinked, etc.)

  • Focus statistics: number of objects per type and statistics related to individual properties

    • % of objects where the property value is missing

    • % of objects where the property has multiple values

    • Distribution of values (without disclosing specific values, only the relative number of their occurrences)

    • Ratio of unique values to the number of objects

Why

  • To generate synthetic data to test our algorithms, heuristics, and LLM prompts

How

  • Use the ninja tool:

  • No need for anonymization, as the data carry almost no information that can be misused, except for property names

  • Even though the object counts are blurred we distinguish only the scale: 0, 1-99, 100-999, 1000-9999, etc.

  • Please contact us at midpilot@evolveum.com and we will provide instructions on how to deliver the exported data securely

Export example (fragments of)

<focus>
  <type>org</type>
  <count>99</count>
  <property>
    <path>identifier</path>
    <multiValuedRatio>0.0</multiValuedRatio>
    <missingRatio>1.0</missingRatio>
    <cardinality>0.0</cardinality>
  </property>
  <property>
    <path>extension/building</path>
    <multiValuedRatio>0.0</multiValuedRatio>
    <missingRatio>1.0</missingRatio>
    <cardinality>0.0</cardinality>
  </property>
  <property>
    <path>displayOrder</path>
    <multiValuedRatio>0.0</multiValuedRatio>
    <missingRatio>0.0</missingRatio>
    <distribution>
      <value>0.33333334</value>
      <value>0.33333334</value>
      <value>0.33333334</value>
    </distribution>
  </property>
</focus>

<shadowStatistics>
  <resourceRef oid="4809f037-d8a7-4daa-9f96-bcead9b534ef"/>
  <objectClass>ri:inetOrgPerson</objectClass>
  <kind>ACCOUNT</kind>
  <intent>external</intent>
  <synchronizationSituation>LINKED</synchronizationSituation>
  <count>9999</count>
</shadowStatistics>
<shadowStatistics>
  <resourceRef oid="9c5f9092-910f-437f-9898-ed578d1c2cf3"/>
  <objectClass>ri:User</objectClass>
  <kind>ACCOUNT</kind>
  <intent>default</intent>
  <synchronizationSituation>LINKED</synchronizationSituation>
  <count>999</count>
</shadowStatistics>

Our Commitment to Information Security, Data Protection and Privacy

  • Secure handling of the provided information

    • Access to provided information is strictly limited to a few individuals (on "need to know" basis). These individuals will be responsible for analyzing the data, making sure the data are properly protected, and that no specific detail of the data leaks into generated synthetic data.

  • No third-party sharing or processing

    • We will never share the data with any third party.

    • We will never process the data using any cloud system or service. All processing of the provided data will be done on infrastructure completely controlled by Evolveum.

  • No AI training based on provided information

    • The data will never be used to train any LLM or any other AI system.

    • No permanent record of any part of your data will remain in midPoint or any associated services.

  • Limited information use: We will use your data only for following purposes:

    • Identification of missing midPoint features and validation of planned features

    • Statistical analysis of configuration elements used in real life to tune suggestions, e.g., for sync reactions

    • Internal validation of our solution during development and testing phases

    • Drawing inspiration for creating synthetic configurations similar to real-life ones

    • Creation of synthetic data sets with sizes and characteristics similar to real life

  • Retention period of the original information is 12 months at most:

    • Removed automatically after that time or earlier upon your request

We appreciate your cooperation and support in enhancing midPoint’s capabilities. Your insights are invaluable for improving midPoint.

Explanations and Procedures

Synthetic data sets contain artificially-generated data, that does not describe any specific customer, deployment or configuration. Synthetic data sets are automatically generated using an algorithm. As these data sets have no connection with specific deployment or customer, they can be safely used in automated testing scenarios and pipelines, making sure that midPoint works properly now, as well as for any future version of midPoint.

The algorithm that generates synthetic data is inspired by real-world scenarios. However, no specific parts of the real-world scenario is directly present in the synthetic data sets. Synthetic data sets may have similar statistical properties as the real-world data, such as similar number of users, similar ratio of the numbers of roles to users, similar statistical distribution of number of assignments in users, similar probabilities of synchronization situations and so. However, the specific details will be different. Synthetic data will usually combine characteristics of several real-world configurations into a single data set, further generalizing the patterns and characteristics.

Synthetic data are never directly derived from the provided real-world data, nor any kind of LLM or GenAI is used for that purpose. The real-world data are always processed by a human expert, who manually creates an algorithm to generate synthetic data sets, making sure that no deployment-specific details can leak into the synthetic data set.

Was this page helpful?
YES NO
Thanks for your feedback