Custom URL parsing and overlapping patterns
The Web Security rules engine uses a category based system for controlling access to specific web sites, called the "Custom URL" module. Categories contain patterns that determine if the rules engine will use the category to match the requested URL. They are a flexible way of controlling web access or overriding the base categorisation of a URL.
The rules engine optimises the categories and their patterns before any rules are executed in the account. It is important to understand how this may affect rule matching if there are overlapping patterns within multiple categories in the account.
Firstly, it is important to understand how patterns are parsed.
Given the example pattern account.acme.com
:
- The pattern is split into parts from left to right using the period "." character. The parts in the example are:
account
,acme
andcom
- There is an implicit wildcard to the left of the left most part in the pattern, i.e.
account
is evaluated as*.account.acme.com
, meaning it would matchaccount.acme.com
and any/all sub-domains of that e.g.my.account.acme.com
.
The next step is for the logic in the rules engine to determine if the pattern matches the requested URL, i.e. https://my.account.acme.com
.
The rules engine first tries an exact match on the URL domain. If one or more URL categories contain the exact pattern my.account.acme.com
, then those categories are selected for use with any filter rules that may apply. Any other sub-domain (or wildcard) matching is skipped and the associated categories will not be selected for rule processing.
If there is no exact match on the URL domain, the rules engine moves on to removing sub-domain parts from the URL domain, from left to right, and trying to match again. For example, it will remove my
and search for account.acme.com
, and if no match, it will remove account
and search for acme.com
, and so on until there is nothing left to match. This is important because it means the most specific pattern will match first, and any others will be ignored, even if the other less specific patterns are in a different category.
If there is a path involved in the pattern it will be evaluated only once the domain part has been matched.
Overlapping pattern scenario
It is possible that two or more patterns could exist in different categories and the patterns could overlap each other.
For example, the category "My Blocked Sites" may contain account.acme.com
and the category "My Allowed Sites" may contain acme.com
.
When the rules engine begins to process the requested URL, e.g. https://my.account.acme.com
, it determines that account.acme.com
exists in the URL category "My Blocked Sites" and it is more specific than any other entry and therefore discards the more generic patternacme.com
even though it is in a different category, i.e. "My Allowed Sites".
Now consider that these categories are used in two rules, attempting to control the same site.
For example, a rule with priority 10 blocks "My Blocked Sites" for all users, and a rule with priority 1 allows "My Allowed Sites" for a specific group of users, thus trying to override the block for certain users. Remember that rules are processed in priority order (ascending) and the first rule that matches will win and end further rule processing.
In this case, the rules engine will not match My Allowed Sites because acme.com
was disregarded in favour of the more specific account.acme.com
which only exists in My Blocked Sites category. One solution would be to change account.acme.com
to acme.com
, so that acme.com becomes the most significant pattern and both categories can match if required by the rules.
Strategies to avoid URL pattern overlap:
- If you wish to control an entire web site (domain), then use the base domain e.g.
acme.com
in a URL category. Avoid sub-domains likewww.acme.com
. - Don't add more specific patterns to a category if the base domain is already in the category e.g. avoid a category containing patterns like
www.acme.com
andacme.com
. Just useacme.com
. - Use a single Category to control a specific site, if it warrants it. For example, create a new category called Acme Site and reference that in a rule, rather than using multiple patterns in multiple categories for the same site.
- Use the Search options in the Custom URL section to find all patterns that match acme.com and remove any duplicates or more significant patterns that could override the base domain match you require.