Navigation Tree

Role Mining

Last modified 22 Sep 2025 10:56 +02:00

Since 4.8

This functionality is available since version 4.8.

Table of Contents

Purpose
Benefits
Process Overview
Key Concepts
Data Model
- Session (Role Analysis Session)
- Cluster (Role Analysis Cluster)
Starting a Session
- Role Mining Presets
- Advanced Options
Technical Details
See Also

Role mining is a process to "mine" business roles from the application roles. Mining process is analyzing application roles and their assignment to users, trying to detect data clusters and patterns, suggesting candidates for business roles.

Purpose

The primary motivation behind role mining is the simplification of role-based access control (RBAC). Users in organization are involved in many roles, organizational units and projects, requiring numerous permissions to carry out their activities. In an ideal RBAC system, each work responsibility, organizational unit assignment or project engagement would have its own role, a business role. Business role would group all the permissions necessary to carry out a specific business activity.

However, modeling business roles is no easy task. Business processes are often under-specified or even not specified at all. It is very difficult to find anyone in an organization that could tell which permissions are required for a specific activity. As business roles are very difficult to define from the bottom up, the common practice is to use application roles instead. Theoretically speaking, application roles represent user permissions in a specific application. Application roles are often the smallest pieces of permission data that identity management system works with. They are not supposed to be assigned to users directly. Unfortunately, it is a very common practice. However, such practice leads to numerous problems. The number of assignments is increasing, the access control model looks like a huge ball of mud in quite a short time. This makes the roles difficult to review and unassign, which leads to accumulation of security risk.

However, the processes are there and business activities are carried out every day. This means that the business-related permission information is there. It is just hidden, buried in the heap of access control data, waiting to be discovered. Role mining is a method to uncover such information. It uses algorithms and Artificial Intelligence (AI) techniques to sift through this maze of permissions, grouping them into cohesive business roles based on patterns of access.

Role mining detects candidates for new business roles based on data clustering, pattern recognition and similar methods. It suggests business roles that allow assigning similar permissions to groups of users. By conducting role mining, organizations can move from a chaotic, role permission model to a more organized, business role-based model. This transition not only enhances security by reducing the risk of access escalation but also makes the entire IAM process more understandable and manageable.

Benefits

Simplified Access Management: Managing user permissions becomes easier when they are grouped into clear, well-defined business roles. Assigning or revoking access for new or departing employees becomes a matter of allocating or removing a single business role, rather than multiple individual roles (permissions).
Reduced Administrative Overhead: Fewer individual permissions to manage means less time spent on administration and fewer errors. Automation and structured roles reduce the need for manual interventions in access management. Instead of managing individual role (permissions) for each user, administrators can manage business roles, considerably reducing the number of entities they need to handle. With well-defined business roles in place, when a user requests access related to a specific job function or project, administrators can simply assign a role rather than piecing together multiple role permissions. This speeds up the provisioning process.
Transparent Access: Clearly defined business roles make it easier to understand who has access to what, simplifying compliance reporting. In environments where users can request access, a business roles simplifies the user experience. Users can request access based on business roles they understand, rather than having to navigate an intricate network of individual roles.
Efficient Audit Trails: With business roles, it becomes simpler to track changes in roles permissions, ensuring an effective and quick audit process. In traditional systems without role mining, audit logs might be flooded with granular access changes. With business roles as aggregate units of permissions, logs can capture more significant, meaningful changes, reducing the noise and making anomalies more evident.
Enhanced Business Agility: As businesses evolve, role mining can help restructure access rights to match changing business needs (adaptable role structures), ensuring that employees always have the tools and information they need. New projects or business units can be quickly equipped with the necessary access by leveraging existing business roles.

Process Overview

Role mining in midPoint is a flexible process that usually includes several steps:

Create role mining session. Session is an envelope object that holds role mining settings, and it will contain results of the analysis as well as overall statistics.
Start role mining process to find clusters. Role mining algorithms are applied to data selected for session to detect clusters. Each cluster contains similar user-permission combinations. The clusters are likely to contain business role suggestions.
Analyze promising clusters, looking for business role suggestion (pattern detection). Each cluster is likely to have one or more suggestions for business roles. Have a closer look at the data in the cluster, selecting a business role candidate.
Refine business role suggestion. Specify a name for the business role, adjust permissions (induced application roles) and role members, fill in other details as necessary.
Apply business role suggestion. New business role is created, specialized task replaces application role membership with business role membership for all affected users.
Analyze other promising clusters in the session, create other business roles. Once all promising clusters are analyzed, the session is done, and it can be retired. It is no longer needed.
Repeat the process as necessary. Start new sessions to find ever finer and finer business roles by adjusting session parameters. Repeat the process as long as it gives practical results.

Key Concepts

Collecting Data

Before starting role mining, it is important to collect all relevant data. This basic step sets the stage for the entire process. Extract the current access rights detailing which users have permissions and access to which. Collect information that can provide overview of access needs and patterns.

Clustering

Once data is collected, it needs to be analyzed to identify patterns and inform the creation of potential business roles. Identify recurring access patterns across different users. Use algorithms to group users with similar access patterns. These clusters hint at potential roles. Detect any unusual or outlier access patterns that might signify misconfigurations or excessive permissions.

We use a density cluster approach that helps group users or roles that act similarly. Density-based clustering is a category of clustering algorithms that partition data points into clusters based on the density of data points in the feature space. The core idea behind these algorithms is that a cluster in the data is a dense region of points that is separated from other dense regions by areas with low density of points.

In our case, we cluster the data with respect to the chosen mode of analysis:

Role Based Analysis - cluster roles according to the similarity of their user members.
User Based Analysis - cluster users according to the similarity of their role access assignment.

Result of a clustering is a list of clusters - similar objects - with respect to the specified criteria (clustering options). The objects that does not meet the criteria falls into a special kind of cluster we call Noise:

Access noise - objects that does not meet user-role similarity criteria.
Rule noise - objects that does not meet group by criteria.
Access or rule noise - combination of the two above

Pattern Detection

After successful clustering, we created groups of similar objects, or clusters. These objects are ready for further analysis from the point of view of finding patterns (groups of objects with exact overlap). To maintain the hierarchy and security of access rights, we focus on the exact overlap match. With such an approach, we avoid noise or similar inconsistencies when designing a pattern, and we will ensure that no new unwanted rights are added.

Clustering data are further analyzed using pattern detection algorithm. Results of a pattern detection can be used as business role candidates.

Session Statistics

After data clustering and pattern detection steps, we have almost everything generated, except for the statistical data. They are important in facilitating the management role of the mining process. They play an important role in the selection of suitable clusters on which to search or build a business role. We process the statistical data for the whole session as well as for each cluster.

Business Role Candidates

After these steps we have a complete session. Based on the analysis, begin to define potential business roles that represent a common approach patterns. We simply go through the list of detected patterns and select the most suitable one. If necessary, the structure of the proposed pattern can be fine-tuned by assigning additional rights and users falling under a specific cluster. Please, note that defining a new Business Role candidate (through midPoint’s GUI) requires an existence of the object collection view with Business role archetype assigned. MidPoint’s initial Business role archetype is used for these purposes. It’s also possible to use a custom Business role archetype in case it has a super archetype reference pointing on the midPoint’s initial Business role archetype.

After roles are defined and refined, it’s time to integrate them into the business role access management structure. The last step of role mining is migration to a business role. In this step, we can set business role additional parameters belonging to the role category, such as adding access rights and candidates who should have the newly created business role and run a task that applies (migrates) the new business role to the selected candidates.

Data Model

Role mining is carried out in sessions. Each role analysis session is a data structure that contains setting for a particular case of role mining. When role mining session is set up and launched, the role mining process is started. Role mining is detecting clusters - groups of similar permissions (application roles) and users. Each cluster is likely to contain a business role suggestion. Both session and clusters are represented by midPoint objects.

Session (Role Analysis Session)

Represents organized settings for a particular role mining case, such as role mining mode, scope of objects to include in the analysis, similarity setting and so on. Session is created by midPoint administrator. Once started, role mining process creates clusters within a session. When the process is complete, the session will contain statistic information regarding role mining results:

Processed objects count - object count that has been processed regarding configuration selected mode of analysis
Clusters count - clusters count in session.
Mean density - represents the average of clusters membership density.

Cluster (Role Analysis Cluster)

Represents detected group of objects with similar characteristics. Clusters are created automatically by role mining process. Each cluster is likely to contain data used for business role suggestions. Clusters also include statistical data about cluster members:

User objects count - number of users of which the cluster consists.
Roles objects count - number of roles of which the cluster consists
Membership density - represents the properties overlap percentage value between cluster objects.
Membership mean - average number:
- Role Based Analysis - of user members
- User Based Analysis - of role assignment
Membership range - minimum and maximum number of:
- Role Based Analysis - users members located in cluster objects
- User Based Analysis - roles assignment located in cluster objects
Reduction metric - number of relationships value when applying the largest pattern found.

Starting a Session

To get started, use one of the pre-configured role mining presets or explore advanced options.

Role Mining Presets

Presets are pre-configured role mining options that target specific use cases and simplify the process of setting up a role mining session. Some of the presets allow for additional configuration. See the configuration options in the Advanced Options section.

Birthright Role - Focus on common access rights
Attribute Based - Focus on attribute based access rights
Balanced Coverage - Focus on optimal access distribution
Exact Access Similarity - Focus on identifying exact access similarities
Department - Discover potential organizational access rights

Advanced Options

Using advanced options you will gain a full control over the role mining algorithm. Use the options below to configure role mining session tailored to your specific use-case:

Session process mode

ROLE_MODE (Role Based Analysis) - the main session object becomes the (RoleType) role object. The analysis will take place due to the similarity of the (UserType) user members of the given roles. User objects in this case fall under the properties of the given role.
USER_MODE (User Based Analysis) - the main session object becomes the (UserType) user object. The analysis will take place in view of the similarity of (RoleType) role assignments of the given user. Role objects in this case fall under the properties of the given user.

Data Collection Options

User filter - defines the condition of objects for data collection. Designed to filter users according to the established rules.
Role filter - defines the condition of objects for data collection. Designed to filter roles according to the established rules.
Assignment filter - defines the condition of objects for data collection. Designed to filter assignment according to the established rules. The role-mining administrator can thus specify, for example, the condition of a relationship with a certain organizational unit, which increases the security and accuracy of the analysis.
Indirect access (experimental) - allowing clustering based on indirect access right.

Clustering Options

Access similarity (Jaccard Similarity) - the minimum value of similarity between two objects that must be met for inclusion in the cluster group.
Required overlap
- Required Roles overlap (User Based Analysis) - minimum overlap size of role users members between two roles.
- Required Users overlap (Role Based Analysis) - minimum overlap size of user roles assignment between two users.
Required object count
- Roles count (Role Based Analysis) - the minimum necessary number of roles to create a cluster that meets the condition of overlap and similarity.
- Users count (User Based Analysis) - the minimum necessary number of users to create a cluster that meets the condition of overlap and similarity.
Group by attribute - adds additional clustering rule that ensures presence of identical attribute values within each cluster.
Analyze attribute - defines additional attributes that are further analyzed in each cluster and role suggestion. It does not have an effect on the clustering algorithm. Provides additional information for the role mining administrator when evaluating subsequent actions.

Pattern Detection Options

Minimum roles - defined the minimum required count of roles occupancy for detected pattern.
Minimum users - defined the minimum required count of users occupancy for detected pattern.

Technical Details

Clustering algorithm is a fundamental piece of our role mining (and outlier detection) process. We employ DBSCAN algorithm over the users and roles. We can map algorithm parameters as follows:

eps - derived from Access similarity
minPts - Required object count
distance function - custom

We have implemented a custom distance measure to cover common use cases from the role mining domain. The distance between two data points is computed according to the following rules:

if the Group by attributes do no match, return 1
if the intersection of properties is less than Required overlap, return 1
else return the Jaccard distance over the user-role assignments

NOTE: Returning 1 means that the points are far away and won’t be part of the same cluster.

NOTE: A data point represents a user and a property represents a role assignment in the user mode. In the role mode, it works vice versa.

After clustering, the pattern detection algorithm is run within each cluster to suggest potential business roles. It is supposed to find the largest intersections over the role-user assignments, which is, in theory, an NP-hard problem. We have implemented a greedy algorithm that works well in practice and suggests solid candidates.