Redirect rules

As a programmer you sometimes gets thrown into a subject you don't know anything about and you have to start from rock bottom. Redirect rules was that kind of subject to me. But you get to learn new things, which is awesome!
I want to share the things I learned and hope you can use it.

I had only seen the most basic example rules and I found it hard to find any good documentation on how to make more advanced rules. What the different attributes and options did and how they worked together. But after a lot of trial and error I started to understand how most of it fit together.

As I finished my work case I looked back and realized how I could make some fairly generic rules so I got an idea.

I sought to make a set of generic redirect rules that most people could use without having to know the deep dark secrets of redirects rules.
The end result was a NuGet Package: RedirectRules

If anyone is interested in a more thorough guide on the anatomy of a rule and the different options and so on, write a comment and I might make a blog post about it, but for now I will stick with just this package.

Nuget Package 

What does it do?

Well to put it short, it insures that your URLs are neat, clean and user friendly.
It bundles together the most common redirect rules I have found and some extra goodies, I'll get into details later.

To put it super simple it makes this ugly URL:
http://mysite.com/SamplePage.aspx/

Into this clean URL:
http://www.mysite.com/samplepage

Both URLs and variations of them, would in most cases give you the same page, especially in Umbraco.
Your webpage can actually be accessed by a number of different URLs. More than you might be aware of.
I believe that it at least helps a little on SEO, as it removes some unwanted duplicate content issues.

The  package implements a bunch of common redirect rules:

  • Remove trailing slash
  • Lower case URL.
  • Remove default.aspx
  • Trim .aspx
  • Enforce www prefix on toplevel domains
  • Enforce no www prefix on sublevel domains

And some extra goodies that's rarely seen elsewhere:

  • No 301 chaining
  • URL whitelisting
  • No browser caching on 301

For any redirect rules to work your server needs to have IIS URL Rewrite 2.0 installed.  You can download it here.
As of IIS 8 it comes preinstalled.

How does it work?

The rules are split up in a few sections, I'll explain what they do and the structure behind it.

Whitelist

<rule name="WhiteList" stopProcessing="true">
	<match url="(.*)" />
	<conditions logicalGrouping="MatchAny" trackAllCaptures="false">
		<add input="{URL}" pattern="^.*/(base|webshop|umbraco|umbraco_client|client|install|api|bundles)/" ignoreCase="true" />
		<add input="{HTTP_HOST}" pattern=".*localhost.*" ignoreCase="true" />
	</conditions>
	<action type="None" />
</rule>

If any of the conditions in this rule are met, we stop the processing of rules and do nothing else. This is intended for URLs we don't want to redirect. Umbraco, APIs, bundles and other situations like these. You can add more conditions if required.

Rewrite rules

<rule name="SEO - Remove trailing slash" stopProcessing="false">
	<match url="^_*(.*)/+$" />
	<conditions>
		<add input="{REQUEST_FILENAME}" matchType="IsFile" negate="true" />
		<add input="{REQUEST_FILENAME}" matchType="IsDirectory" negate="true" />
	</conditions>
	<action type="Rewrite" url="_{R:1}" />
</rule>
<rule name="SEO - ToLower" stopProcessing="false">
	<match url="^_*(.*)" ignoreCase="false" />
	<conditions logicalGrouping="MatchAll" trackAllCaptures="false">
		<add input="{R:1}" pattern="[A-Z]" ignoreCase="false" />
		<add input="{REQUEST_FILENAME}" matchType="IsFile" negate="true" />
		<add input="{REQUEST_FILENAME}" matchType="IsDirectory" negate="true" />
		<add input="{R:1}" pattern="^.*?\.(axd|css|js|jpg|jpeg|png|gif|ashx|asmx|svc).*?$" negate="true" ignoreCase="true" />
	</conditions>
	<action type="Rewrite" url="_{ToLower:{R:1}}" />
</rule>
<rule name="SEO - remove default.aspx" stopProcessing="false">
	<match url="^_*(.*?)/?default\.aspx$" />
	<action type="Rewrite" url="_{R:1}" />
</rule>
<rule name="SEO - Trim aspx" stopProcessing="false">
	<match url="^_*(.*)\.aspx$" />
	<action type="Rewrite" url="_{R:1}" />
</rule>

These are the common redirect rules. There are a few things different here than you normally see in a redirect rule. First off, it doesn't actually redirect them, it only makes a rewrite. It doesn't stop the processing either, which means it will go through all the rules and modify the URL each time.

The redirect

<rule name="Redirect - Subdomains with www to non-www" stopProcessing="true">
	<match url="^_*(.*)" />
	<conditions logicalGrouping="MatchAll" trackAllCaptures="false">
		<add input="{HTTP_HOST}" pattern="^www\.(.*)\.([^\.]+)\.([^\.]+?)$" />
	</conditions>
	<action type="Redirect" url="{MapSSL:{HTTPS}}{C:1}.{C:2}.{C:3}/{R:1}" redirectType="Permanent" />
</rule>
<rule name="Redirect - Top domains with non-www to www" stopProcessing="true">
	<match url="^_*(.*)" />
	<conditions logicalGrouping="MatchAll" trackAllCaptures="false">
		<add input="{HTTP_HOST}" pattern="^([^\.]+)\.([^\.]+?)$" />
	</conditions>
	<action type="Redirect" url="{MapSSL:{HTTPS}}www.{HTTP_HOST}/{R:1}" redirectType="Permanent" />
</rule>
<rule name="Redirect - Non-canonical redirect" stopProcessing="true">
	<match url="^_+(.*)" />
	<action type="Redirect" url="{R:1}" redirectType="Permanent" />
</rule>

On a rewrite it can only modify the URL, it cannot modify the domain name. So here it has to redirect.

The first two rules fixes any www issues. It does so by matching the domain with a regex, looking at how many punctuation marks are present in the domain name.

This will mess up co.uk domains as there is a punctuation mark more than my regex expects. It is possible to modify the rule to take fix that. This functionality though, has not been implemented yet. If you need it and need my help, write me - I'll be happy to help. 

The last rule is a fallback in case the domain name is correct, but the URL was somehow modified.
As you might have noticed it adds an underscore (_) on each of the rules when it rewrites it. It is this underscore that it tests for in this last rule.
This naturally also makes URLs with underscore impossible. But to be fair I haven't seen many pages with underscores in them, which was why I choose underscore. It could have been anything.

It does the rewrites rather than redirects to avoid redirecting more times than necessary. This should help keep some of the link juice for your google ranking.

Permanent Caching

<rewriteMaps>
	<rewriteMap name="MapSSL" defaultValue="OFF">
		<add key="ON" value="https://" />
		<add key="OFF" value="http://" />
	</rewriteMap>
</rewriteMaps>
<outboundRules>
	<rule name="RewriteCache-Control" preCondition="old url with 301">
		<match serverVariable="RESPONSE_Cache-Control" pattern="(.*)" />
		<action type="Rewrite" value="NO-CACHE" />
	</rule>
	<preConditions>
		<preCondition name="old url with 301">
			<add input="{RESPONSE_CONTENT_TYPE}" pattern="^text/html" />
			<add input="{RESPONSE_STATUS}" pattern="^301$" />
		</preCondition>
	</preConditions>
</outboundRules>

The rewrite map is used in the rules to ensure that if the request came through HTTPS, it stays in HTTPS, same with HTTP. You can see the usage in the rules above.

Now the outbound rule is something special. I haven't seen it anywhere else, but I had a heavy need for it during testing.

I'll explain why it is important. 
Take a case where your webpage is being redirected from a /contact to a /support page. Not unreasonable.

The browser hits the contact page and receives a 301 permanent redirect. The browser takes this very literally. 301 is a permanent redirect and therefore it is cached. Permanently.

But as the web goes, nothing is really permanent and at some point you might want to have an actual contact page. Well you can't.

Any browser that have made a request to the contact page would never be able to hit that page.
The browser already cached the redirect so it has no "need" to make a request to the page, it goes into the cache and redirects the request immediately.

This caused me huge headaches when testing out these rules as I had to clear my entire cache each time I wanted to test any redirect. And you can't really ask your visitors to clear their cache because you wanted to make a contact page. 

This outbound rule fixes that. On a 301 response status it changes the Cache-Control header and sets it to "NO-CACHE" essentially telling the browser not to cache the redirect.
You can modify this if you want it to just cache it, for example, for a week.
Now you're able to reuse your redirected pages at a later date if you want to. 

Now, there is most likely still improvements to be made, so if anyone have suggestions or fixes, they are more than welcome to share and I will get the package updated.

Merry Christmas everyone!

Niels Ellegaard

Niels is on Twitter as