Outlier detection

Last modified 23 Jan 2025 11:29 +01:00

Table of Contents

Motivation
How it works
Outlier Detection Options
Technical Details
Glossary
See Also

Outlier detection in role-based access control (RBAC) system refers to the process of identifying and handling roles or permissions that deviate from the expected or normal behavior. This can include roles that have excessive or unnecessary access privileges, or roles that have been assigned to users who do not typically require them.

Motivation

Outlier detection helps to ensure the security and integrity of the system by identifying and addressing potential vulnerabilities or misconfigurations. It also helps to optimize the system by removing unnecessary or redundant roles, making it more efficient and easier to manage.

How it works

In Midpoint is the outlier detection built on top of Role Mining algorithm. After the successful clustering and pattern detection, the outlier detection algorithm is executed. This algorithm identifies anomalous data and decides whether to mark a user as an outlier based on multiple statistical measures.

Analyzed data is divided into two mutually exclusive categories:

Classified data - includes outliers and users not identified as outliers.
Unclassified data - includes users and roles the algorithm cannot analyze (e.g., due to low popularity or lack of data structure).

The algorithm only provides suggestions for outliers and does not take any further action. It is up to the user to analyze, assess, and decide the next step.

System requirements

Outlier detection and role mining are resource-intensive tasks, often requiring more memory (RAM) allocation, as typically recommended in Midpoint.

Outlier Detection Options

In the Anomaly detection step you specify options below:

Outlier confidence - level of certainty used to classify a user as an outlier
Standard deviation confidence interval - controls z-score filter in anomaly detection
Frequency threshold - minimum frequency of a role within a cluster to be considered normal and not an anomaly

By default, the algorithm doesn’t analyze Unclassified outliers. Turn it on with Include noise analysis option in the Clustering Options

Technical Details

The outlier detection algorithm can be summarized as follows:

For each user find a peer group
Compute role frequencies within each group
Utilize weighted z-score on the role frequency data to identify anomalies
Apply min frequency threshold filter to the anomalies
For each user that contain anomaly, compute overall outlier score
Mark user as an outlier if it’s overall score is greater than Outlier confidence threshold

Overall score represents the level of certainty that a user is an outlier. It is derived from various statistical measures. The pseudocode below outlines how the overall score is calculated:

anomaly_scores = []

for role in user.anomalies:
    anomaly_score = mean(
        distribution_score, # 1 - cummulative_prob(zscore)
        pattern_score, # 1 - assignment_count(top_pattern) / assignment_count(peer group)
        item_factor_score, # 1 - mean(weighted attribute density for each user attribute)
        role_member_score, # 1 - size(role members) / size(all users)
        coverage_score, # 1 - size(role members in peer group) / size(users of peer group)
    )
    anomaly_scores.append(anomaly_score)

mean_anomaly_score = mean(anomaly_scores)

peer_group_score = mean(
    assignment_freq, # size(user roles) / size(all peer group roles)
    outlier_pattern_score, # assignment_count(top_pattern) / assignment_count(peer group)
    average_item_factor, # weighted user attribute density over joint attributes
    peer_group_density, # peer group density
)

overall_score = mean(mean_anomaly_score, peer_group_score)

Glossary

List of terms in context of outlier detection in Midpoint.

Outlier - user that deviates from normal
Anomaly - unusual user-role assignment
Cluster - group of similar users with respect to the clustering options
Noise - users that do not fall into any cluster
Peer group - group of users similar to a given user
- if the user is part of Cluster - cluster members
- if the user is part of Noise - close users computed solely on Jaccard distance, ignoring Group by attributes clustering option

See Also

Was this page helpful?

YES NO

Thanks for your feedback