Outlier detection

Last modified 15 Jan 2025 16:50 +01:00

Outlier detection in role-based access control (RBAC) system refers to the process of identifying and handling roles or permissions that deviate from the expected or normal behavior. This can include roles that have excessive or unnecessary access privileges, or roles that have been assigned to users who do not typically require them.

Motivation

Outlier detection helps to ensure the security and integrity of the system by identifying and addressing potential vulnerabilities or misconfigurations. It also helps to optimize the system by removing unnecessary or redundant roles, making it more efficient and easier to manage.

How it works

In Midpoint is the outlier detection built on top of Role Mining algorithm. After the successful clustering and pattern detection, the outlier detection algorithm is executed. This algorithm identifies anomalous data and decides whether to mark a user as an outlier based on multiple statistical measures.

Analyzed data is divided into two mutually exclusive categories:

  1. Classified data - includes outliers and users not identified as outliers.

  2. Unclassified data - includes users and roles the algorithm cannot analyze (e.g., due to low popularity or lack of data structure).

The algorithm only provides suggestions for outliers and does not take any further action. It is up to the user to analyze, assess, and decide the next step.

Outlier Detection Options

In the Anomaly detection step you specify options below:

  • Outlier confidence - level of certainty used to classify a user as an outlier

  • Standard deviation confidence interval - controls z-score filter in anomaly detection

  • Frequency threshold - minimum frequency of a role within a cluster to be considered normal and not an anomaly

By default, the algorithm doesn’t analyze Unclassified outliers. Turn it on with Include noise analysis option in the Clustering Options

Technical Details

The outlier detection algorithm can be summarized as follows:

  1. For each user find a peer group

  2. Compute role frequencies within each group

  3. Utilize weighted z-score on the role frequency data to identify anomalies

  4. Apply min frequency threshold filter to the anomalies

  5. For each user that contain anomaly, compute overall outlier score

  6. Mark user as an outlier if it’s overall score is greater than Outlier confidence threshold

Overall score represents the level of certainty that a user is an outlier. It is derived from various statistical measures. The pseudocode below outlines how the overall score is calculated:

anomaly_scores = []

for role in user.anomalies:
    anomaly_score = mean(
        distribution_score, # 1 - cummulative_prob(zscore)
        pattern_score, # 1 - assignment_count(top_pattern) / assignment_count(peer group)
        item_factor_score, # 1 - mean(weighted attribute density for each user attribute)
        role_member_score, # 1 - size(role members) / size(all users)
        coverage_score, # 1 - size(role members in peer group) / size(users of peer group)
    )
    anomaly_scores.append(anomaly_score)

mean_anomaly_score = mean(anomaly_scores)

peer_group_score = mean(
    assignment_freq, # size(user roles) / size(all peer group roles)
    outlier_pattern_score, # assignment_count(top_pattern) / assignment_count(peer group)
    average_item_factor, # weighted user attribute density over joint attributes
    peer_group_density, # peer group density
)

overall_score = mean(mean_anomaly_score, peer_group_score)

Glossary

List of terms in context of outlier detection in Midpoint.

  • Outlier - user that deviates from normal

  • Anomaly - unusual user-role assignment

  • Cluster - group of similar users with respect to the clustering options

  • Noise - users that do not fall into any cluster

  • Peer group - group of users similar to a given user

    • if the user is part of Cluster - cluster members

    • if the user is part of Noise - close users computed solely on Jaccard distance, ignoring Group by attributes clustering option

Was this page helpful?
YES NO
Thanks for your feedback