UnReal Web Marketing
1-866-601-1882-
-301 Redirect
SEO
SEO Free Analysis
SEO Clients
Web Analytics
Web Design
Website Data Reports
Website Conversion
Pay Per Click
Keyword Research
SEO Copywriting
Link Building
SEO Resources
 
Search Engine Friendly URL's
301 redirect is the most efficient and Search Engine Friendly method for webpage redirection. It's not that hard to implement and it should preserve your search engine rankings for that particular page. If you have to change file names or move pages around, it's the safest option. The code "301" is interpreted as "moved permanently".
Search Engine Optimization

Below are a Couple of methods to implement URL Redirection

IIS Redirect

  • In internet services manager, right click on the file or folder you wish to redirect
  • Select the radio titled "a redirection to a URL".
  • Enter the redirection page
  • Check "The exact url entered above" and the "A permanent redirection for this resource"
  • Click on 'Apply'

ColdFusion Redirect

<.cfheader statuscode="301" statustext="Moved permanently">
<.cfheader name="Location" value="http://www.new-url.com">

PHP Redirect

<?
Header( "HTTP/1.1 301 Moved Permanently" );
Header( "Location: http://www.new-url.com" );
?>

ASP Redirect

<%@ Language=VBScript %>
<%
Response.Status="301 Moved Permanently";
Response.AddHeader("Location","http://www.new-url.com/");
%>
ASP .NET Redirect
<script runat="server">
private void Page_Load(object sender, System.EventArgs e)
{
Response.Status = "301 Moved Permanently";
Response.AddHeader("Location","http://www.new-url.com");
}
</script>
JSP (Java) Redirect
<%
response.setStatus(301);
response.setHeader( "Location", "http://www.new-url.com/" );
response.setHeader( "Connection", "close" );
%>
CGI PERL Redirect
$q = new CGI;
print $q->redirect("http://www.new-url.com/");
Ruby on Rails Redirect
def old_action
headers["Status"] = "301 Moved Permanently"
redirect_to "http://www.new-url.com/"
end
Redirect Old domain to New domain ( htaccess redirect )

Create a .htaccess file with the below code, it will ensure that all your directories and pages of your old domain will get correctly redirected to your new domain.
The .htaccess file needs to be placed in the root directory of your old website (i.e the same directory where your index file is placed) Options +FollowSymLinks
RewriteEngine on
RewriteRule (.*) http://www.newdomain.com/$1 [R=301,L] Please REPLACE www.newdomain.com in the above code with your actual domain name. In addition to the redirect I would suggest that you contact every backlinking site to modify their backlink to point to your new website. Note* This .htaccess method of redirection works ONLY on Linux servers having the Apache Mod-Rewrite moduled enabled.

Redirect to www ( htaccess redirect )

Create a .htaccess file with the below code, it will ensure that all requests coming in to domain.com will get redirected to www.domain.com
The .htaccess file needs to be placed in the root directory of your old website (i.e the same directory where your index file is placed) Options +FollowSymlinks
RewriteEngine on
rewritecond %{http_host} ^domain.com [nc]
rewriterule ^(.*)$ http://www.domain.com/$1 [r=301,nc] Please REPLACE domain.com and www.newdomain.com with your actual domain name.

Note* This .htaccess method of redirection works ONLY on Linux servers having the Apache Mod-Rewrite moduled enabled.

URL Rewriting Guide (Apache Web Server)

Introduction to mod_rewrite

The Apache module mod_rewrite is a killer one, i.e. it is a really sophisticated module which provides a powerful way to do URL manipulations. With it you can do nearly all types of URL manipulations you ever dreamed about. The price you have to pay is to accept complexity, because mod_rewrite 's major drawback is that it is not easy to understand and use for the beginner. And even Apache experts sometimes discover new aspects where mod_rewrite can help.

In other words: With mod_rewrite you either shoot yourself in the foot the first time and never use it again or love it for the rest of your life because of its power. This paper tries to give you a few initial success events to avoid the first case by presenting already invented solutions to you.

Practical Solutions

Here come a lot of practical solutions I've either invented myself or collected from other people's solutions in the past. Feel free to learn the black magic of URL rewriting from these examples.

ATTENTION: Depending on your server-configuration it can be necessary to slightly change the examples for your situation, e.g. adding the [PT] flag when additionally using mod_alias and mod_userdir , etc. Or rewriting a ruleset to fit in .htaccess context instead of per-server context. Always try to understand what a particular ruleset really does before you use it. It avoid problems.

URL Layout

Canonical URLs

Description:

On some webservers there are more than one URL for a resource. Usually there are canonical URLs (which should be actually used and distributed) and those which are just shortcuts, internal ones, etc. Independent of which URL the user supplied with the request he should finally see the canonical one only.

Solution:

We do an external HTTP redirect for all non-canonical URLs to fix them in the location view of the Browser and for all subsequent requests. In the example ruleset below we replace /~user by the canonical /u/user and fix a missing trailing slash for /u/user .

RewriteRule ^/ ~ ([^/]+)/?(.*) / u /$1/$2 [ R ] RewriteRule ^/([uge])/( [^/]+ )$ /$1/$2 / [ R ]

Canonical Hostnames

Description: The goal of this rule is to force the use of a particular hostname, in preference to other hostnames which may be used to reach the same site. For example, if you wish to force the use of www.example.com instead of example.com , you might use a variant of the following recipe. Solution: # For sites running on a port other than 80 RewriteCond %{HTTP_HOST} !^fully\.qualified\.domain\.name [NC] RewriteCond %{HTTP_HOST} !^$ RewriteCond %{SERVER_PORT} !^80$ RewriteRule ^/(.*) http://fully.qualified.domain.name:%{SERVER_PORT}/$1 [L,R] # And for a site running on port 80 RewriteCond %{HTTP_HOST} !^fully\.qualified\.domain\.name [NC] RewriteCond %{HTTP_HOST} !^$ RewriteRule ^/(.*) http://fully.qualified.domain.name/$1 [L,R]

Moved DocumentRoot

Description:

Usually the DocumentRoot of the webserver directly relates to the URL " / ". But often this data is not really of top-level priority, it is perhaps just one entity of a lot of data pools. For instance at our Intranet sites there are /e/www/ (the homepage for WWW), /e/sww/ (the homepage for the Intranet) etc. Now because the data of the DocumentRoot stays at /e/www/ we had to make sure that all inlined images and other stuff inside this data pool work for subsequent requests.

Solution:

We redirect the URL / to /e/www/ :

RewriteEngine on RewriteRule ^/$ /e/www/ [ R ]

Note that this can also be handled using the RedirectMatch directive:

RedirectMatch ^/$ http://example.com/e/www/

Trailing Slash Problem

Description:

Every webmaster can sing a song about the problem of the trailing slash on URLs referencing directories. If they are missing, the server dumps an error, because if you say /~quux/foo instead of /~quux/foo/ then the server searches for a file named foo . And because this file is a directory it complains. Actually it tries to fix it itself in most of the cases, but sometimes this mechanism need to be emulated by you. For instance after you have done a lot of complicated URL rewritings to CGI scripts etc.

Solution:

The solution to this subtle problem is to let the server add the trailing slash automatically. To do this correctly we have to use an external redirect, so the browser correctly requests subsequent images etc. If we only did a internal rewrite, this would only work for the directory page, but would go wrong when any images are included into this page with relative URLs, because the browser would request an in-lined object. For instance, a request for image.gif in /~quux/foo/index.html would become /~quux/image.gif without the external redirect!

So, to do this trick we write:

RewriteEngine on RewriteBase /~quux/ RewriteRule ^foo $ foo / [ R ]

The crazy and lazy can even do the following in the top-level .htaccess file of their homedir. But notice that this creates some processing overhead.

RewriteEngine on RewriteBase /~quux/ RewriteCond %{REQUEST_FILENAME} -d RewriteRule ^(.+ [^/] )$ $1 / [R]

Webcluster through Homogeneous URL Layout

Description:

We want to create a homogeneous and consistent URL layout over all WWW servers on a Intranet webcluster, i.e. all URLs (per definition server local and thus server dependent!) become actually server independent ! What we want is to give the WWW namespace a consistent server-independent layout: no URL should have to include any physically correct target server. The cluster itself should drive us automatically to the physical target host.

Solution:

First, the knowledge of the target servers come from (distributed) external maps which contain information where our users, groups and entities stay. The have the form

user1 server_of_user1 user2 server_of_user2 : :

We put them into files map.xxx-to-host . Second we need to instruct all servers to redirect URLs of the forms

/u/user/anypath /g/group/anypath /e/entity/anypath

to

http://physical-host/u/user/anypath http://physical-host/g/group/anypath http://physical-host/e/entity/anypath

when the URL is not locally valid to a server. The following ruleset does this for us by the help of the map files (assuming that server0 is a default server which will be used if a user has no entry in the map):

RewriteEngine on RewriteMap user-to-host txt:/path/to/map.user-to-host RewriteMap group-to-host txt:/path/to/map.group-to-host RewriteMap entity-to-host txt:/path/to/map.entity-to-host RewriteRule ^/u/ ([^/]+) /?(.*) http:// ${user-to-host:$1|server0} /u/$1/$2 RewriteRule ^/g/ ([^/]+) /?(.*) http:// ${group-to-host:$1|server0} /g/$1/$2 RewriteRule ^/e/ ([^/]+) /?(.*) http:// ${entity-to-host:$1|server0} /e/$1/$2 RewriteRule ^/([uge])/([^/]+)/?$ /$1/$2/.www/ RewriteRule ^/([uge])/([^/]+)/([^.]+.+) /$1/$2/.www/$3\

Move Homedirs to Different Webserver

Description:

Many webmasters have asked for a solution to the following situation: They wanted to redirect just all homedirs on a webserver to another webserver. They usually need such things when establishing a newer webserver which will replace the old one over time.

Solution:

The solution is trivial with mod_rewrite . On the old webserver we just redirect all /~user/anypath URLs to http://newserver/~user/anypath .

RewriteEngine on RewriteRule ^/~(.+) http:// newserver /~$1 [R,L]

Structured Homedirs

Description:

Some sites with thousands of users usually use a structured homedir layout, i.e. each homedir is in a subdirectory which begins for instance with the first character of the username. So, /~foo/anypath is /home/ f /foo/.www/anypath while /~bar/anypath is /home/ b /bar/.www/anypath .

Solution:

We use the following ruleset to expand the tilde URLs into exactly the above layout.

RewriteEngine on RewriteRule ^/~( ([a-z]) [a-z0-9]+)(.*) /home/ $2 /$1/.www$3

Filesystem Reorganization

Description:

This really is a hardcore example: a killer application which heavily uses per-directory RewriteRules to get a smooth look and feel on the Web while its data structure is never touched or adjusted. Background: net.sw is my archive of freely available Unix software packages, which I started to collect in 1992. It is both my hobby and job to to this, because while I'm studying computer science I have also worked for many years as a system and network administrator in my spare time. Every week I need some sort of software so I created a deep hierarchy of directories where I stored the packages:

drwxrwxr-x 2 netsw users 512 Aug 3 18:39 Audio/ drwxrwxr-x 2 netsw users 512 Jul 9 14:37 Benchmark/ drwxrwxr-x 12 netsw users 512 Jul 9 00:34 Crypto/ drwxrwxr-x 5 netsw users 512 Jul 9 00:41 Database/ drwxrwxr-x 4 netsw users 512 Jul 30 19:25 Dicts/ drwxrwxr-x 10 netsw users 512 Jul 9 01:54 Graphic/ drwxrwxr-x 5 netsw users 512 Jul 9 01:58 Hackers/ drwxrwxr-x 8 netsw users 512 Jul 9 03:19 InfoSys/ drwxrwxr-x 3 netsw users 512 Jul 9 03:21 Math/ drwxrwxr-x 3 netsw users 512 Jul 9 03:24 Misc/ drwxrwxr-x 9 netsw users 512 Aug 1 16:33 Network/ drwxrwxr-x 2 netsw users 512 Jul 9 05:53 Office/ drwxrwxr-x 7 netsw users 512 Jul 9 09:24 SoftEng/ drwxrwxr-x 7 netsw users 512 Jul 9 12:17 System/ drwxrwxr-x 12 netsw users 512 Aug 3 20:15 Typesetting/ drwxrwxr-x 10 netsw users 512 Jul 9 14:08 X11/

In July 1996 I decided to make this archive public to the world via a nice Web interface. "Nice" means that I wanted to offer an interface where you can browse directly through the archive hierarchy. And "nice" means that I didn't wanted to change anything inside this hierarchy - not even by putting some CGI scripts at the top of it. Why? Because the above structure should be later accessible via FTP as well, and I didn't want any Web or CGI stuff to be there.

Solution:

The solution has two parts: The first is a set of CGI scripts which create all the pages at all directory levels on-the-fly. I put them under /e/netsw/.www/ as follows:

-rw-r--r-- 1 netsw users 1318 Aug 1 18:10 .wwwacl drwxr-xr-x 18 netsw users 512 Aug 5 15:51 DATA/ -rw-rw-rw- 1 netsw users 372982 Aug 5 16:35 LOGFILE -rw-r--r-- 1 netsw users 659 Aug 4 09:27 TODO -rw-r--r-- 1 netsw users 5697 Aug 1 18:01 netsw-about.html -rwxr-xr-x 1 netsw users 579 Aug 2 10:33 netsw-access.pl -rwxr-xr-x 1 netsw users 1532 Aug 1 17:35 netsw-changes.cgi -rwxr-xr-x 1 netsw users 2866 Aug 5 14:49 netsw-home.cgi drwxr-xr-x 2 netsw users 512 Jul 8 23:47 netsw-img/ -rwxr-xr-x 1 netsw users 24050 Aug 5 15:49 netsw-lsdir.cgi -rwxr-xr-x 1 netsw users 1589 Aug 3 18:43 netsw-search.cgi -rwxr-xr-x 1 netsw users 1885 Aug 1 17:41 netsw-tree.cgi -rw-r--r-- 1 netsw users 234 Jul 30 16:35 netsw-unlimit.lst

The DATA/ subdirectory holds the above directory structure, i.e. the real net.sw stuff and gets automatically updated via rdist from time to time. The second part of the problem remains: how to link these two structures together into one smooth-looking URL tree? We want to hide the DATA/ directory from the user while running the appropriate CGI scripts for the various URLs. Here is the solution: first I put the following into the per-directory configuration file in the DocumentRoot of the server to rewrite the announced URL /net.sw/ to the internal path /e/netsw :

RewriteRule ^net.sw$ net.sw/ [R] RewriteRule ^net.sw/(.*)$ e/netsw/$1

The first rule is for requests which miss the trailing slash! The second rule does the real thing. And then comes the killer configuration which stays in the per-directory config file /e/netsw/.www/.wwwacl :

Options ExecCGI FollowSymLinks Includes MultiViews RewriteEngine on # we are reached via /net.sw/ prefix RewriteBase /net.sw/ # first we rewrite the root dir to # the handling cgi script RewriteRule ^$ netsw-home.cgi [L] RewriteRule ^index\.html$ netsw-home.cgi [L] # strip out the subdirs when # the browser requests us from perdir pages RewriteRule ^.+/(netsw-[^/]+/.+)$ $1 [L] # and now break the rewriting for local files RewriteRule ^netsw-home\.cgi.* - [L] RewriteRule ^netsw-changes\.cgi.* - [L] RewriteRule ^netsw-search\.cgi.* - [L] RewriteRule ^netsw-tree\.cgi$ - [L] RewriteRule ^netsw-about\.html$ - [L] RewriteRule ^netsw-img/.*$ - [L] # anything else is a subdir which gets handled # by another cgi script RewriteRule !^netsw-lsdir\.cgi.* - [C] RewriteRule (.*) netsw-lsdir.cgi/$1

Some hints for interpretation:

  1. Notice the L (last) flag and no substitution field (' - ') in the forth part
  2. Notice the ! (not) character and the C (chain) flag at the first rule in the last part
  3. Notice the catch-all pattern in the last rule

NCSA imagemap to Apache mod_imap

Description:

When switching from the NCSA webserver to the more modern Apache webserver a lot of people want a smooth transition. So they want pages which use their old NCSA imagemap program to work under Apache with the modern mod_imap . The problem is that there are a lot of hyperlinks around which reference the imagemap program via /cgi-bin/imagemap/path/to/page.map . Under Apache this has to read just /path/to/page.map .

Solution:

We use a global rule to remove the prefix on-the-fly for all requests:

RewriteEngine on RewriteRule ^/cgi-bin/imagemap(.*) $1 [PT]

Search pages in more than one directory

Description:

Sometimes it is necessary to let the webserver search for pages in more than one directory. Here MultiViews or other techniques cannot help.

Solution:

We program a explicit ruleset which searches for the files in the directories.

RewriteEngine on # first try to find it in custom/... # ...and if found stop and be happy: RewriteCond /your/docroot/ dir1 /%{REQUEST_FILENAME} -f RewriteRule ^(.+) /your/docroot/ dir1 /$1 [L] # second try to find it in pub/... # ...and if found stop and be happy: RewriteCond /your/docroot/ dir2 /%{REQUEST_FILENAME} -f RewriteRule ^(.+) /your/docroot/ dir2 /$1 [L] # else go on for other Alias or ScriptAlias directives, # etc. RewriteRule ^(.+) - [PT]

Set Environment Variables According To URL Parts

Description:

Perhaps you want to keep status information between requests and use the URL to encode it. But you don't want to use a CGI wrapper for all pages just to strip out this information.

Solution:

We use a rewrite rule to strip out the status information and remember it via an environment variable which can be later dereferenced from within XSSI or CGI. This way a URL /foo/S=java/bar/ gets translated to /foo/bar/ and the environment variable named STATUS is set to the value "java".

RewriteEngine on RewriteRule ^(.*)/ S=([^/]+) /(.*) $1/$3 [E= STATUS:$2 ]

Virtual User Hosts

Description:

Assume that you want to provide www. username .host.domain.com for the homepage of username via just DNS A records to the same machine and without any virtualhosts on this machine.

Solution:

For HTTP/1.0 requests there is no solution, but for HTTP/1.1 requests which contain a Host: HTTP header we can use the following ruleset to rewrite http://www.username.host.com/anypath internally to /home/username/anypath :

RewriteEngine on RewriteCond %{ HTTP_HOST } ^www\. [^.]+ \.host\.com$ RewriteRule ^(.+) %{HTTP_HOST}$1 [C] RewriteRule ^www\. ([^.]+) \.host\.com(.*) /home/ $1 $2

Redirect Homedirs For Foreigners

Description:

We want to redirect homedir URLs to another webserver www.somewhere.com when the requesting user does not stay in the local domain ourdomain.com . This is sometimes used in virtual host contexts.

Solution:

Just a rewrite condition:

RewriteEngine on RewriteCond %{REMOTE_HOST} !^.+\.ourdomain\.com$ RewriteRule ^(/~.+) http://www.somewhere.com/$1 [R,L]

Redirect Failing URLs To Other Webserver

Description:

A typical FAQ about URL rewriting is how to redirect failing requests on webserver A to webserver B. Usually this is done via ErrorDocument CGI-scripts in Perl, but there is also a mod_rewrite solution. But notice that this performs more poorly than using an ErrorDocument CGI-script!

Solution:

The first solution has the best performance but less flexibility, and is less error safe:

RewriteEngine on RewriteCond /your/docroot/%{REQUEST_FILENAME} !-f RewriteRule ^(.+) http:// webserverB .dom/$1

The problem here is that this will only work for pages inside the DocumentRoot . While you can add more Conditions (for instance to also handle homedirs, etc.) there is better variant:

RewriteEngine on RewriteCond %{REQUEST_URI} !-U RewriteRule ^(.+) http:// webserverB .dom/$1

This uses the URL look-ahead feature of mod_rewrite . The result is that this will work for all types of URLs and is a safe way. But it does a performance impact on the webserver, because for every request there is one more internal subrequest. So, if your webserver runs on a powerful CPU, use this one. If it is a slow machine, use the first approach or better a ErrorDocument CGI-script.

Extended Redirection

Description:

Sometimes we need more control (concerning the character escaping mechanism) of URLs on redirects. Usually the Apache kernels URL escape function also escapes anchors, i.e. URLs like " url#anchor ". You cannot use this directly on redirects with mod_rewrite because the uri_escape() function of Apache would also escape the hash character. How can we redirect to such a URL?

Solution:

We have to use a kludge by the use of a NPH-CGI script which does the redirect itself. Because here no escaping is done (NPH=non-parseable headers). First we introduce a new URL scheme xredirect: by the following per-server config-line (should be one of the last rewrite rules):

RewriteRule ^xredirect:(.+) /path/to/nph-xredirect.cgi/$1 \ [T=application/x-httpd-cgi,L]

This forces all URLs prefixed with xredirect: to be piped through the nph-xredirect.cgi program. And this program just looks like:

#!/path/to/perl ## ## nph-xredirect.cgi -- NPH/CGI script for extended redirects ## Copyright (c) 1997 Ralf S. Engelschall, All Rights Reserved. ## $| = 1; $url = $ENV{'PATH_INFO'}; print "HTTP/1.0 302 Moved Temporarily\n"; print "Server: $ENV{'SERVER_SOFTWARE'}\n"; print "Location: $url\n"; print "Content-type: text/html\n"; print "\n"; print "<html>\n"; print "<head>\n"; print "<title>302 Moved Temporarily (EXTENDED)</title>\n"; print "</head>\n"; print "<body>\n"; print "<h1>Moved Temporarily (EXTENDED)</h1>\n"; print "The document has moved <a HREF=\"$url\">here</a>.<p>\n"; print "</body>\n"; print "</html>\n"; ##EOF##

This provides you with the functionality to do redirects to all URL schemes, i.e. including the one which are not directly accepted by mod_rewrite . For instance you can now also redirect to news:newsgroup via

RewriteRule ^anyurl xredirect:news:newsgroup Notice: You have not to put [R] or [R,L] to the above rule because the xredirect: need to be expanded later by our special "pipe through" rule above.

Archive Access Multiplexer

Description:

Do you know the great CPAN (Comprehensive Perl Archive Network) under http://www.perl.com/CPAN ? This does a redirect to one of several FTP servers around the world which carry a CPAN mirror and is approximately near the location of the requesting client. Actually this can be called an FTP access multiplexing service. While CPAN runs via CGI scripts, how can a similar approach implemented via mod_rewrite ?

Solution:

First we notice that from version 3.0.0 mod_rewrite can also use the " ftp: " scheme on redirects. And second, the location approximation can be done by a RewriteMap over the top-level domain of the client. With a tricky chained ruleset we can use this top-level domain as a key to our multiplexing map.

RewriteEngine on RewriteMap multiplex txt:/path/to/map.cxan RewriteRule ^/CxAN/(.*) %{REMOTE_HOST}::$1 [C] RewriteRule ^.+\. ([a-zA-Z]+) ::(.*)$ ${multiplex: $1 |ftp.default.dom}$2 [R,L] ## ## map.cxan -- Multiplexing Map for CxAN ## de ftp://ftp.cxan.de/CxAN/ uk ftp://ftp.cxan.uk/CxAN/ com ftp://ftp.cxan.com/CxAN/ : ##EOF##

Time-Dependent Rewriting

Description:

When tricks like time-dependent content should happen a lot of webmasters still use CGI scripts which do for instance redirects to specialized pages. How can it be done via mod_rewrite ?

Solution:

There are a lot of variables named TIME_xxx for rewrite conditions. In conjunction with the special lexicographic comparison patterns <STRING , >STRING and =STRING we can do time-dependent redirects:

RewriteEngine on RewriteCond %{TIME_HOUR}%{TIME_MIN} >0700 RewriteCond %{TIME_HOUR}%{TIME_MIN} <1900 RewriteRule ^foo\.html$ foo.day.html RewriteRule ^foo\.html$ foo.night.html

This provides the content of foo.day.html under the URL foo.html from 07:00-19:00 and at the remaining time the contents of foo.night.html . Just a nice feature for a homepage...

Backward Compatibility for YYYY to XXXX migration

Description:

How can we make URLs backward compatible (still existing virtually) after migrating document.YYYY to document.XXXX , e.g. after translating a bunch of .html files to .phtml ?

Solution:

We just rewrite the name to its basename and test for existence of the new extension. If it exists, we take that name, else we rewrite the URL to its original state.

# backward compatibility ruleset for # rewriting document.html to document.phtml # when and only when document.phtml exists # but no longer document.html RewriteEngine on RewriteBase /~quux/ # parse out basename, but remember the fact RewriteRule ^(.*)\.html$ $1 [C,E=WasHTML:yes] # rewrite to document.phtml if exists RewriteCond %{REQUEST_FILENAME}.phtml -f RewriteRule ^(.*)$ $1.phtml [S=1] # else reverse the previous basename cutout RewriteCond %{ENV:WasHTML} ^yes$ RewriteRule ^(.*)$ $1.html

Content Handling

From Old to New (intern)

Description:

Assume we have recently renamed the page foo.html to bar.html and now want to provide the old URL for backward compatibility. Actually we want that users of the old URL even not recognize that the pages was renamed.

Solution:

We rewrite the old URL to the new one internally via the following rule:

RewriteEngine on RewriteBase /~quux/ RewriteRule ^ foo \.html$ bar .html

From Old to New (extern)

Description:

Assume again that we have recently renamed the page foo.html to bar.html and now want to provide the old URL for backward compatibility. But this time we want that the users of the old URL get hinted to the new one, i.e. their browsers Location field should change, too.

Solution:

We force a HTTP redirect to the new URL which leads to a change of the browsers and thus the users view:

RewriteEngine on RewriteBase /~quux/ RewriteRule ^ foo \.html$ bar .html [ R ]

Browser Dependent Content

Description:

At least for important top-level pages it is sometimes necessary to provide the optimum of browser dependent content, i.e. one has to provide a maximum version for the latest Netscape variants, a minimum version for the Lynx browsers and a average feature version for all others.

Solution:

We cannot use content negotiation because the browsers do not provide their type in that form. Instead we have to act on the HTTP header "User-Agent". The following condig does the following: If the HTTP header "User-Agent" begins with "Mozilla/3", the page foo.html is rewritten to foo.NS.html and and the rewriting stops. If the browser is "Lynx" or "Mozilla" of version 1 or 2 the URL becomes foo.20.html . All other browsers receive page foo.32.html . This is done by the following ruleset:

RewriteCond %{HTTP_USER_AGENT} ^ Mozilla/3 .* RewriteRule ^foo\.html$ foo. NS .html [ L ] RewriteCond %{HTTP_USER_AGENT} ^ Lynx/ .* [OR] RewriteCond %{HTTP_USER_AGENT} ^ Mozilla/[12] .* RewriteRule ^foo\.html$ foo. 20 .html [ L ] RewriteRule ^foo\.html$ foo. 32 .html [ L ]

Dynamic Mirror

Description:

Assume there are nice webpages on remote hosts we want to bring into our namespace. For FTP servers we would use the mirror program which actually maintains an explicit up-to-date copy of the remote data on the local machine. For a webserver we could use the program webcopy which acts similar via HTTP. But both techniques have one major drawback: The local copy is always just as up-to-date as often we run the program. It would be much better if the mirror is not a static one we have to establish explicitly. Instead we want a dynamic mirror with data which gets updated automatically when there is need (updated data on the remote host).

Solution:

To provide this feature we map the remote webpage or even the complete remote webarea to our namespace by the use of the Proxy Throughput feature (flag [P] ):

RewriteEngine on RewriteBase /~quux/ RewriteRule ^ hotsheet/ (.*)$ http://www.tstimpreso.com/hotsheet/ $1 [ P ] RewriteEngine on RewriteBase /~quux/ RewriteRule ^ usa-news\.html $ http://www.quux-corp.com/news/index.html [ P ]

Reverse Dynamic Mirror

Description: ... Solution: RewriteEngine on RewriteCond /mirror/of/remotesite/$1 -U RewriteRule ^http://www\.remotesite\.com/(.*)$ /mirror/of/remotesite/$1

Retrieve Missing Data from Intranet

Description:

This is a tricky way of virtually running a corporate (external) Internet webserver ( www.quux-corp.dom ), while actually keeping and maintaining its data on a (internal) Intranet webserver ( www2.quux-corp.dom ) which is protected by a firewall. The trick is that on the external webserver we retrieve the requested data on-the-fly from the internal one.

Solution:

First, we have to make sure that our firewall still protects the internal webserver and that only the external webserver is allowed to retrieve data from it. For a packet-filtering firewall we could for instance configure a firewall ruleset like the following:

ALLOW Host www.quux-corp.dom Port >1024 --> Host www2.quux-corp.dom Port 80 DENY Host * Port * --> Host www2.quux-corp.dom Port 80

Just adjust it to your actual configuration syntax. Now we can establish the mod_rewrite rules which request the missing data in the background through the proxy throughput feature:

RewriteRule ^/~([^/]+)/?(.*) /home/$1/.www/$2 RewriteCond %{REQUEST_FILENAME} !-f RewriteCond %{REQUEST_FILENAME} !-d RewriteRule ^/home/([^/]+)/.www/?(.*) http:// www2 .quux-corp.dom/~$1/pub/$2 [ P ]

Load Balancing

Description:

Suppose we want to load balance the traffic to www.foo.com over www[0-5].foo.com (a total of 6 servers). How can this be done?

Solution:

There are a lot of possible solutions for this problem. We will discuss first a commonly known DNS-based variant and then the special one with mod_rewrite :

  1. DNS Round-Robin

    The simplest method for load-balancing is to use the DNS round-robin feature of BIND . Here you just configure www[0-9].foo.com as usual in your DNS with A(address) records, e.g.

    www0 IN A 1.2.3.1 www1 IN A 1.2.3.2 www2 IN A 1.2.3.3 www3 IN A 1.2.3.4 www4 IN A 1.2.3.5 www5 IN A 1.2.3.6

    Then you additionally add the following entry:

    www IN CNAME www0.foo.com. IN CNAME www1.foo.com. IN CNAME www2.foo.com. IN CNAME www3.foo.com. IN CNAME www4.foo.com. IN CNAME www5.foo.com. IN CNAME www6.foo.com.

    Notice that this seems wrong, but is actually an intended feature of BIND and can be used in this way. However, now when www.foo.com gets resolved, BIND gives out www0-www6 - but in a slightly permutated/rotated order every time. This way the clients are spread over the various servers. But notice that this not a perfect load balancing scheme, because DNS resolve information gets cached by the other nameservers on the net, so once a client has resolved www.foo.com to a particular wwwN.foo.com , all subsequent requests also go to this particular name wwwN.foo.com . But the final result is ok, because the total sum of the requests are really spread over the various webservers.

  2. DNS Load-Balancing

    A sophisticated DNS-based method for load-balancing is to use the program lbnamed which can be found at http://www.stanford.edu/~schemers/docs/lbnamed/lbnamed.html . It is a Perl 5 program in conjunction with auxilliary tools which provides a real load-balancing for DNS.

  3. Proxy Throughput Round-Robin

    In this variant we use mod_rewrite and its proxy throughput feature. First we dedicate www0.foo.com to be actually www.foo.com by using a single

    www IN CNAME www0.foo.com.

    entry in the DNS. Then we convert www0.foo.com to a proxy-only server, i.e. we configure this machine so all arriving URLs are just pushed through the internal proxy to one of the 5 other servers ( www1-www5 ). To accomplish this we first establish a ruleset which contacts a load balancing script lb.pl for all URLs.

    RewriteEngine on RewriteMap lb prg:/path/to/lb.pl RewriteRule ^/(.+)$ ${lb:$1} [P,L]

    Then we write lb.pl :

    #!/path/to/perl ## ## lb.pl -- load balancing script ## $| = 1; $name = "www"; # the hostname base $first = 1; # the first server (not 0 here, because 0 is myself) $last = 5; # the last server in the round-robin $domain = "foo.dom"; # the domainname $cnt = 0; while (<STDIN>) { $cnt = (($cnt+1) % ($last+1-$first)); $server = sprintf("%s%d.%s", $name, $cnt+$first, $domain); print "http://$server/$_"; } ##EOF## A last notice: Why is this useful? Seems like www0.foo.com still is overloaded? The answer is yes, it is overloaded, but with plain proxy throughput requests, only! All SSI, CGI, ePerl, etc. processing is completely done on the other machines. This is the essential point.
  4. Hardware/TCP Round-Robin

    There is a hardware solution available, too. Cisco has a beast called LocalDirector which does a load balancing at the TCP/IP level. Actually this is some sort of a circuit level gateway in front of a webcluster. If you have enough money and really need a solution with high performance, use this one.

New MIME-type, New Service

Description:

On the net there are a lot of nifty CGI programs. But their usage is usually boring, so a lot of webmaster don't use them. Even Apache's Action handler feature for MIME-types is only appropriate when the CGI programs don't need special URLs (actually PATH_INFO and QUERY_STRINGS ) as their input. First, let us configure a new file type with extension .scgi (for secure CGI) which will be processed by the popular cgiwrap program. The problem here is that for instance we use a Homogeneous URL Layout (see above) a file inside the user homedirs has the URL /u/user/foo/bar.scgi . But cgiwrap needs the URL in the form /~user/foo/bar.scgi/ . The following rule solves the problem:

RewriteRule ^/[uge]/ ([^/]+) /\.www/(.+)\.scgi(.*) ... ... /internal/cgi/user/cgiwrap/~ $1 /$2.scgi$3 [NS, T=application/x-http-cgi ]

Or assume we have some more nifty programs: wwwlog (which displays the access.log for a URL subtree and wwwidx (which runs Glimpse on a URL subtree). We have to provide the URL area to these programs so they know on which area they have to act on. But usually this ugly, because they are all the times still requested from that areas, i.e. typically we would run the swwidx program from within /u/user/foo/ via hyperlink to

/internal/cgi/user/swwidx?i=/u/user/foo/

which is ugly. Because we have to hard-code both the location of the area and the location of the CGI inside the hyperlink. When we have to reorganize the area, we spend a lot of time changing the various hyperlinks.

Solution:

The solution here is to provide a special new URL format which automatically leads to the proper CGI invocation. We configure the following:

RewriteRule ^/([uge])/([^/]+)(/?.*)/\* /internal/cgi/user/wwwidx?i=/$1/$2$3/ RewriteRule ^/([uge])/([^/]+)(/?.*):log /internal/cgi/user/wwwlog?f=/$1/$2$3

Now the hyperlink to search at /u/user/foo/ reads only

HREF="*"

which internally gets automatically transformed to

/internal/cgi/user/wwwidx?i=/u/user/foo/

The same approach leads to an invocation for the access log CGI program when the hyperlink :log gets used.

From Static to Dynamic

Description:

How can we transform a static page foo.html into a dynamic variant foo.cgi in a seamless way, i.e. without notice by the browser/user.

Solution:

We just rewrite the URL to the CGI-script and force the correct MIME-type so it gets really run as a CGI-script. This way a request to /~quux/foo.html internally leads to the invocation of /~quux/foo.cgi .

RewriteEngine on RewriteBase /~quux/ RewriteRule ^foo\. html $ foo. cgi [T= application/x-httpd-cgi ]

On-the-fly Content-Regeneration

Description:

Here comes a really esoteric feature: Dynamically generated but statically served pages, i.e. pages should be delivered as pure static pages (read from the filesystem and just passed through), but they have to be generated dynamically by the webserver if missing. This way you can have CGI-generated pages which are statically served unless one (or a cronjob) removes the static contents. Then the contents gets refreshed.

Solution: This is done via the following ruleset: RewriteCond %{REQUEST_FILENAME} !-s RewriteRule ^page\. html $ page. cgi [T=application/x-httpd-cgi,L]

Here a request to page.html leads to a internal run of a corresponding page.cgi if page.html is still missing or has filesize null. The trick here is that page.cgi is a usual CGI script which (additionally to its STDOUT ) writes its output to the file page.html . Once it was run, the server sends out the data of page.html . When the webmaster wants to force a refresh the contents, he just removes page.html (usually done by a cronjob).

Document With Autorefresh

Description:

Wouldn't it be nice while creating a complex webpage if the webbrowser would automatically refresh the page every time we write a new version from within our editor? Impossible?

Solution:

No! We just combine the MIME multipart feature, the webserver NPH feature and the URL manipulation power of mod_rewrite . First, we establish a new URL feature: Adding just :refresh to any URL causes this to be refreshed every time it gets updated on the filesystem.

RewriteRule ^(/[uge]/[^/]+/?.*):refresh /internal/cgi/apache/nph-refresh?f=$1

Now when we reference the URL

/u/foo/bar/page.html:refresh

this leads to the internal invocation of the URL

/internal/cgi/apache/nph-refresh?f=/u/foo/bar/page.html

The only missing part is the NPH-CGI script. Although one would usually say "left as an exercise to the reader" ;-) I will provide this, too.

#!/sw/bin/perl ## ## nph-refresh -- NPH/CGI script for auto refreshing pages ## Copyright (c) 1997 Ralf S. Engelschall, All Rights Reserved. ## $| = 1; # split the QUERY_STRING variable @pairs = split(/&/, $ENV{'QUERY_STRING'}); foreach $pair (@pairs) { ($name, $value) = split(/=/, $pair); $name =~ tr/A-Z/a-z/; $name = 'QS_' . $name; $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg; eval "\$$name = \"$value\""; } $QS_s = 1 if ($QS_s eq ''); $QS_n = 3600 if ($QS_n eq ''); if ($QS_f eq '') { print "HTTP/1.0 200 OK\n"; print "Content-type: text/html\n\n"; print "&lt;b&gt;ERROR&lt;/b&gt;: No file given\n"; exit(0); } if (! -f $QS_f) { print "HTTP/1.0 200 OK\n"; print "Content-type: text/html\n\n"; print "&lt;b&gt;ERROR&lt;/b&gt;: File $QS_f not found\n"; exit(0); } sub print_http_headers_multipart_begin { print "HTTP/1.0 200 OK\n"; $bound = "ThisRandomString12345"; print "Content-type: multipart/x-mixed-replace;boundary=$bound\n"; &print_http_headers_multipart_next; } sub print_http_headers_multipart_next { print "\n--$bound\n"; } sub print_http_headers_multipart_end { print "\n--$bound--\n"; } sub displayhtml { local($buffer) = @_; $len = length($buffer); print "Content-type: text/html\n"; print "Content-length: $len\n\n"; print $buffer; } sub readfile { local($file) = @_; local(*FP, $size, $buffer, $bytes); ($x, $x, $x, $x, $x, $x, $x, $size) = stat($file); $size = sprintf("%d", $size); open(FP, "&lt;$file"); $bytes = sysread(FP, $buffer, $size); close(FP); return $buffer; } $buffer = &readfile($QS_f); &print_http_headers_multipart_begin; &displayhtml($buffer); sub mystat { local($file) = $_[0]; local($time); ($x, $x, $x, $x, $x, $x, $x, $x, $x, $mtime) = stat($file); return $mtime; } $mtimeL = &mystat($QS_f); $mtime = $mtime; for ($n = 0; $n &lt; $QS_n; $n++) { while (1) { $mtime = &mystat($QS_f); if ($mtime ne $mtimeL) { $mtimeL = $mtime; sleep(2); $buffer = &readfile($QS_f); &print_http_headers_multipart_next; &displayhtml($buffer); sleep(5); $mtimeL = &mystat($QS_f); last; } sleep($QS_s); } } &print_http_headers_multipart_end; exit(0); ##EOF##

Mass Virtual Hosting

Description:

The <VirtualHost> feature of Apache is nice and works great when you just have a few dozens virtual hosts. But when you are an ISP and have hundreds of virtual hosts to provide this feature is not the best choice.

Solution:

To provide this feature we map the remote webpage or even the complete remote webarea to our namespace by the use of the Proxy Throughput feature (flag [P] ):

## ## vhost.map ## www.vhost1.dom:80 /path/to/docroot/vhost1 www.vhost2.dom:80 /path/to/docroot/vhost2 : www.vhostN.dom:80 /path/to/docroot/vhostN ## ## httpd.conf ## : # use the canonical hostname on redirects, etc. UseCanonicalName on : # add the virtual host in front of the CLF-format CustomLog /path/to/access_log "%{VHOST}e %h %l %u %t \"%r\" %>s %b" : # enable the rewriting engine in the main server RewriteEngine on # define two maps: one for fixing the URL and one which defines # the available virtual hosts with their corresponding # DocumentRoot. RewriteMap lowercase int:tolower RewriteMap vhost txt:/path/to/vhost.map # Now do the actual virtual host mapping # via a huge and complicated single rule: # # 1. make sure we don't map for common locations RewriteCond %{REQUEST_URI} !^/commonurl1/.* RewriteCond %{REQUEST_URI} !^/commonurl2/.* : RewriteCond %{REQUEST_URI} !^/commonurlN/.* # # 2. make sure we have a Host header, because # currently our approach only supports # virtual hosting through this header RewriteCond %{HTTP_HOST} !^$ # # 3. lowercase the hostname RewriteCond ${lowercase:%{HTTP_HOST}|NONE} ^(.+)$ # # 4. lookup this hostname in vhost.map and # remember it only when it is a path # (and not "NONE" from above) RewriteCond ${vhost:%1} ^(/.*)$ # # 5. finally we can map the URL to its docroot location # and remember the virtual host for logging puposes RewriteRule ^/(.*)$ %1/$1 [E=VHOST:${lowercase:%{HTTP_HOST}}] :

Access Restriction

Blocking of Robots

Description:

How can we block a really annoying robot from retrieving pages of a specific webarea? A /robots.txt file containing entries of the "Robot Exclusion Protocol" is typically not enough to get rid of such a robot.

Solution:

We use a ruleset which forbids the URLs of the webarea /~quux/foo/arc/ (perhaps a very deep directory indexed area where the robot traversal would create big server load). We have to make sure that we forbid access only to the particular robot, i.e. just forbidding the host where the robot runs is not enough. This would block users from this host, too. We accomplish this by also matching the User-Agent HTTP header information.

RewriteCond %{HTTP_USER_AGENT} ^ NameOfBadRobot .* RewriteCond %{REMOTE_ADDR} ^ 123\.45\.67\.[8-9] $ RewriteRule ^ /~quux/foo/arc/ .+ - [ F ]

Blocked Inline-Images

Description:

Assume we have under http://www.quux-corp.de/~quux/ some pages with inlined GIF graphics. These graphics are nice, so others directly incorporate them via hyperlinks to their pages. We don't like this practice because it adds useless traffic to our server.

Solution:

While we cannot 100% protect the images from inclusion, we can at least restrict the cases where the browser sends a HTTP Referer header.

RewriteCond %{HTTP_REFERER} !^$ RewriteCond %{HTTP_REFERER} !^http://www.quux-corp.de/~quux/.*$ [NC] RewriteRule .*\.gif$ - [F] RewriteCond %{HTTP_REFERER} !^$ RewriteCond %{HTTP_REFERER} !.*/foo-with-gif\.html$ RewriteRule ^inlined-in-foo\.gif$ - [F]

Host Deny

Description:

How can we forbid a list of externally configured hosts from using our server?

Solution:

For Apache >= 1.3b6:

RewriteEngine on RewriteMap hosts-deny txt:/path/to/hosts.deny RewriteCond ${hosts-deny:%{REMOTE_HOST}|NOT-FOUND} !=NOT-FOUND [OR] RewriteCond ${hosts-deny:%{REMOTE_ADDR}|NOT-FOUND} !=NOT-FOUND RewriteRule ^/.* - [F]

For Apache <= 1.3b6:

RewriteEngine on RewriteMap hosts-deny txt:/path/to/hosts.deny RewriteRule ^/(.*)$ ${hosts-deny:%{REMOTE_HOST}|NOT-FOUND}/$1 RewriteRule !^NOT-FOUND/.* - [F] RewriteRule ^NOT-FOUND/(.*)$ ${hosts-deny:%{REMOTE_ADDR}|NOT-FOUND}/$1 RewriteRule !^NOT-FOUND/.* - [F] RewriteRule ^NOT-FOUND/(.*)$ /$1 ## ## hosts.deny ## ## ATTENTION! This is a map, not a list, even when we treat it as such. ## mod_rewrite parses it for key/value pairs, so at least a ## dummy value "-" must be present for each entry. ## 193.102.180.41 - bsdti1.sdm.de - 192.76.162.40 -

Proxy Deny

Description:

How can we forbid a certain host or even a user of a special host from using the Apache proxy?

Solution:

We first have to make sure mod_rewrite is below(!) mod_proxy in the Configuration file when compiling the Apache webserver. This way it gets called before mod_proxy . Then we configure the following for a host-dependent deny...

RewriteCond %{REMOTE_HOST} ^badhost\.mydomain\.com$ RewriteRule !^http://[^/.]\.mydomain.com.* - [F]

...and this one for a user@host-dependent deny:

RewriteCond %{REMOTE_IDENT}@%{REMOTE_HOST} ^badguy@badhost\.mydomain\.com$ RewriteRule !^http://[^/.]\.mydomain.com.* - [F]

Special Authentication Variant

Description:

Sometimes a very special authentication is needed, for instance a authentication which checks for a set of explicitly configured users. Only these should receive access and without explicit prompting (which would occur when using the Basic Auth via mod_auth ).

Solution:

We use a list of rewrite conditions to exclude all except our friends:

RewriteCond %{REMOTE_IDENT}@%{REMOTE_HOST} !^friend1@client1.quux-corp\.com$ RewriteCond %{REMOTE_IDENT}@%{REMOTE_HOST} !^friend2 @client2.quux-corp\.com$ RewriteCond %{REMOTE_IDENT}@%{REMOTE_HOST} !^friend3 @client3.quux-corp\.com$ RewriteRule ^/~quux/only-for-friends/ - [F]

Referer-based Deflector

Description:

How can we program a flexible URL Deflector which acts on the "Referer" HTTP header and can be configured with as many referring pages as we like?

Solution:

Use the following really tricky ruleset...

RewriteMap deflector txt:/path/to/deflector.map RewriteCond %{HTTP_REFERER} !="" RewriteCond ${deflector:%{HTTP_REFERER}} ^-$ RewriteRule ^.* %{HTTP_REFERER} [R,L] RewriteCond %{HTTP_REFERER} !="" RewriteCond ${deflector:%{HTTP_REFERER}|NOT-FOUND} !=NOT-FOUND RewriteRule ^.* ${deflector:%{HTTP_REFERER}} [R,L]

... in conjunction with a corresponding rewrite map:

## ## deflector.map ## http://www.badguys.com/bad/index.html - http://www.badguys.com/bad/index2.html - http://www.badguys.com/bad/index3.html http://somewhere.com/

This automatically redirects the request back to the referring page (when " - " is used as the value in the map) or to a specific URL (when an URL is specified in the map as the second argument).

Other

External Rewriting Engine

Description:

A FAQ: How can we solve the FOO/BAR/QUUX/etc. problem? There seems no solution by the use of mod_rewrite ...

Solution:

Use an external RewriteMap , i.e. a program which acts like a RewriteMap . It is run once on startup of Apache receives the requested URLs on STDIN and has to put the resulting (usually rewritten) URL on STDOUT (same order!).

RewriteEngine on RewriteMap quux-map prg: /path/to/map.quux.pl RewriteRule ^/~quux/(.*)$ /~quux/ ${quux-map:$1} #!/path/to/perl # disable buffered I/O which would lead # to deadloops for the Apache server $| = 1; # read URLs one per line from stdin and # generate substitution URL on stdout while (<>) { s|^foo/|bar/|; print $_; }

This is a demonstration-only example and just rewrites all URLs /~quux/foo/... to /~quux/bar/... . Actually you can program whatever you like. But notice that while such maps can be used also by an average user, only the system administrator can define it.

URL Rewriting Guide (Windows Web Server)

Introduction
Take a moment to look at some of the URLs on your website. Do you find URLs like http://yoursite.com/info/dispEmployeeInfo.aspx?EmpID=459-099&type=summary? Or maybe you have a bunch of Web pages that were moved from one directory or website to another, resulting in broken links for visitors who have bookmarked the old URLs. In this article we'll look at using URL rewriting to shorten those ugly URLs to meaningful, memorable ones, by replacing http://yoursite.com/info/dispEmployeeInfo.aspx?EmpID=459-099&type=summary with something like http://yoursite.com/people/sales/chuck.smith. We'll also see how URL rewriting can be used to create an intelligent 404 error.

URL rewriting is the process of intercepting an incoming Web request and redirecting the request to a different resource. When performing URL rewriting, typically the URL being requested is checked and, based on its value, the request is redirected to a different URL. For example, in the case where a website restructuring caused all of the Web pages in the /people/ directory to be moved to a /info/employees/ directory, you would want to use URL rewriting to check if a Web request was intended for a file in the /people/ directory. If the request was for a file in the /people/ directory, you'd want to automatically redirect the request to the same file, but in the /info/employees/ directory instead.

With classic ASP, the only way to utilize URL rewriting was to write an ISAPI filter or to buy a third-party product that offered URL rewriting capabilities. With Microsoft® ASP.NET, however, you can easily create your own URL rewriting software in a number of ways. In this article we'll examine the techniques available to ASP.NET developers for implementing URL rewriting, and then turn to some real-world uses of URL rewriting. Before we delve into the technological specifics of URL rewriting, let's first take a look at some everyday scenarios where URL rewriting can be employed.

Common Uses of URL Rewriting
Creating data-driven ASP.NET websites often results in a single Web page that displays a subset of the database's data based on querystring parameters. For example, in designing an e-commerce site, one of your tasks would be to allow users to browse through the products for sale. To facilitate this, you might create a page called displayCategory.aspx that would display the products for a given category. The category's products to view would be specified by a querystring parameter. That is, if the user wanted to browse the Widgets for sale, and all Widgets had a had a CategoryID of 5, the user would visit: http://yousite.com/displayCategory.aspx?CategoryID=5.

There are two downsides to creating a website with such URLs. First, from the end user's perspective, the URL http://yousite.com/displayCategory.aspx?CategoryID=5 is a mess. Usability expert Jakob Neilsen recommends that URLs be chosen so that they:

Are short.
Are easy to type.
Visualize the site structure.
"Hackable," allowing the user to navigate through the site by hacking off parts of the URL.
I would add to that list that URLs should also be easy to remember. The URL http://yousite.com/displayCategory.aspx?CategoryID=5 meets none of Neilsen's criteria, nor is it easy to remember. Asking users to type in querystring values makes a URL hard to type and makes the URL "hackable" only by experienced Web developers who have an understanding of the purpose of querystring parameters and their name/value pair structure.

A better approach is to allow for a sensible, memorable URL, such as http://yoursite.com/products/Widgets. By just looking at the URL you can infer what will be displayed—information about Widgets. The URL is easy to remember and share, too. I can tell my colleague, "Check out yoursite.com/products/Widgets," and she'll likely be able to bring up the page without needing to ask me again what the URL was. (Try doing that with, say, an Amazon.com page!) The URL also appears, and should behave, "hackable." That is, if the user hacks of the end of the URL, and types in http://yoursite.com/products, they should see a listing of all products, or at least a listing of all categories of products they can view.

Note For a prime example of a "hackable" URL, consider the URLs generated by many blog engines. To view the posts for January 28, 2004, one visits a URL like http://someblog.com/2004/01/28. If the URL is hacked down to http://someblog.com/2004/01, the user will see all posts for January 2004. Cutting it down further to http://someblog.com/2004 will display all posts for the year 2004.

In addition to simplifying URLs, URL rewriting is also often used to handle website restructuring that would otherwise result in numerous broken links and outdated bookmarks.

What Happens When a Request Reaches IIS
Before we examine exactly how to implement URL rewriting, it's important that we have an understanding of how incoming requests are handled by Microsoft® Internet Information Services (IIS). When a request arrives at an IIS Web server, IIS examines the requested file's extension to determine how handle the request. Requests can be handled natively by IIS—as are HTML pages, images, and other static content—or IIS can route the request to an ISAPI extension. (An ISAPI extension is an unmanaged, compiled class that handles an incoming Web request. Its task is to generate the content for the requested resource.)

For example, if a request comes in for a Web page named Info.asp, IIS will route the message to the asp.dll ISAPI extension. This ISAPI extension will then load the requested ASP page, execute it, and return its rendered HTML to IIS, which will then send it back to the requesting client. For ASP.NET pages, IIS routes the message to the aspnet_isapi.dll ISAPI extension. The aspnet_isapi.dll ISAPI extension then hands off processing to the managed ASP.NET worker process, which processes the request, returning the ASP.NET Web page's rendered HTML.

You can customize IIS to specify what extensions are mapped to what ISAPI extensions. Figure 1 shows the Application Configuration dialog box from the Internet Information Services Administrative Tool. Note that the ASP.NET-related extensions—.aspx, .ascx, .config, .asmx, .rem, .cs, .vb, and others—are all mapped to the aspnet_isapi.dll ISAPI extension.

A thorough discussion of how IIS manages incoming requests is a bit beyond the scope of this article. A great, in-depth discussion, though, can be found in Michele Leroux Bustamante's article Inside IIS and ASP.NET. It's important to understand that the ASP.NET engine gets its hands only on incoming Web requests whose extensions are explicitly mapped to the aspnet_isapi.dll in IIS.

Examining Requests with ISAPI Filters
In addition to mapping the incoming Web request's file extension to the appropriate ISAPI extension, IIS also performs a number of other tasks. For example, IIS attempts to authenticate the user making the request and determine if the authenticated user has authorization to access the requested file. During the lifetime of handling a request, IIS passes through several states. At each state, IIS raises an event that can be programmatically handled using ISAPI filters.

Like ISAPI extensions, ISAPI filters are blocks of unmanaged code installed on the Web server. ISAPI extensions are designed to generate the response for a request to a particular file type. ISAPI filters, on the other hand, contain code to respond to events raised by IIS. ISAPI filters can intercept and even modify the incoming and outgoing data. ISAPI filters have numerous applications, including:

Authentication and authorization.
Logging and monitoring.
HTTP compression.
URL rewriting.
While ISAPI filters can be used to perform URL rewriting, this article examines implementing URL rewriting using ASP.NET. However, we will discuss the tradeoffs between implementing URL rewriting as an ISAPI filter versus using techniques available in ASP.NET.

What Happens When a Request Enters the ASP.NET Engine
Prior to ASP.NET, URL rewriting on IIS Web servers needed to be implemented using an ISAPI filter. URL rewriting is possible with ASP.NET because the ASP.NET engine is strikingly similar to IIS. The similarities arise because the ASP.NET engine:

Raises events as it processes a request.
Allows an arbitrary number of HTTP modules handle the events that are raised, akin to IIS's ISAPI filters.
Delegates rendering the requested resource to an HTTP handler, which is akin to IIS's ISAPI extensions.
Like IIS, during the lifetime of a request the ASP.NET engine fires events signaling its change from one state of processing to another. The BeginRequest event, for example, is fired when the ASP.NET engine first responds to a request. The AuthenticateRequest event fires next, which occurs when the identity of the user has been established. (There are numerous other events—AuthorizeRequest, ResolveRequestCache, and EndRequest, among others. These events are events of the System.Web.HttpApplication class; for more information consult the HttpApplication Class Overview technical documentation.)

As we discussed in the previous section, ISAPI filters can be created to respond to the events raised by IIS. In a similar vein, ASP.NET provides HTTP modules that can respond to the events raised by the ASP.NET engine. An ASP.NET Web application can be configured to have multiple HTTP modules. For each request processed by the ASP.NET engine, each configured HTTP module is initialized and allowed to wire up event handlers to the events raised during the processing of the request. Realize that there are a number of built-in HTTP modules utilized on each an every request. One of the built-in HTTP modules is the FormsAuthenticationModule, which first checks to see if forms authentication is being used and, if so, whether the user is authenticated or not. If not, the user is automatically redirected to the specified logon page.

Recall that with IIS, an incoming request is eventually directed to an ISAPI extension, whose job it is to return the data for the particular request. For example, when a request for a classic ASP Web page arrives, IIS hands off the request to the asp.dll ISAPI extension, whose task it is to return the HTML markup for the requested ASP page. The ASP.NET engine utilizes a similar approach. After initializing the HTTP modules, the ASP.NET engine's next task is to determine what HTTP handler should process the request.

All requests that pass through the ASP.NET engine eventually arrive at an HTTP handler or an HTTP handler factory (an HTTP handler factory simply returns an instance of an HTTP handler that is then used to process the request). The final HTTP handler renders the requested resource, returning the response. This response is sent back to IIS, which then returns it to the user that made the request.

ASP.NET includes a number of built-in HTTP handlers. The PageHandlerFactory, for example, is used to render ASP.NET Web pages. The WebServiceHandlerFactory is used to render the response SOAP envelopes for ASP.NET Web services. The TraceHandler renders the HTML markup for requests to trace.axd.

Figure 2 illustrates how a request for an ASP.NET resource is handled. First, IIS receives the request and dispatches it to aspnet_isapi.dll. Next, the ASP.NET engine initializes the configured HTTP modules. Finally, the proper HTTP handler is invoked and the requested resource is rendered, returning the generated markup back to IIS and back to the requesting client.

Creating and Registering Custom HTTP Modules and HTTP Handlers
Creating custom HTTP modules and HTTP handlers are relatively simple tasks, which involve created a managed class that implements the correct interface. HTTP modules must implement the System.Web.IHttpModule interface, while HTTP handlers and HTTP handler factories must implement the System.Web.IHttpHandler interface and System.Web.IHttpHandlerFactory interface, respectively. The specifics of creating HTTP handlers and HTTP modules is beyond the scope of this article. For a good background, read Mansoor Ahmed Siddiqui's article, HTTP Handlers and HTTP Modules in ASP.NET.

Once a custom HTTP module or HTTP handler has been created, it must be registered with the Web application. Registering HTTP modules and HTTP handlers for an entire Web server requires only a simple addition to the machine.config file; registering an HTTP module or HTTP handler for a specific Web application involves adding a few lines of XML to the application's Web.config file.

Specifically, to add an HTTP module to a Web application, add the following lines in the Web.config's configuration/system.web section:

Copy Code
<httpModules>
<add type="type" name="name" />
</httpModules>

The type value provides the assembly and class name of the HTTP module, whereas the name value provides a friendly name by which the HTTP module can be referred to in the Global.asax file.

HTTP handlers and HTTP handler factories are configured by the <httpHandlers> tag in the Web.config's configuration/system.web section, like so:

Copy Code
<httpHandlers>
<add verb="verb" path="path" type="type" />
</httpHandlers>

Recall that for each incoming request, the ASP.NET engine determines what HTTP handler should be used to render the request. This decision is made based on the incoming requests verb and path. The verb specifies what type of HTTP request was made—GET or POST—whereas the path specifies the location and filename of the file requested. So, if we wanted to have an HTTP handler handle all requests—either GET or POST—for files with the .scott extension, we'd add the following to the Web.config file:

Copy Code
<httpHandlers>
<add verb="*" path="*.scott" type="type" />
</httpHandlers>

where type was the type of our HTTP handler.

Note When registering HTTP handlers, it is important to ensure that the extensions used by the HTTP handler are mapped in IIS to the ASP.NET engine. That is, in our .scott example, if the .scott extension is not mapped in IIS to the aspnet_isapi.dll ISAPI extension, a request for the file foo.scott will result in IIS attempting to return the contents of the file foo.scott. In order for the HTTP handler to process this request, the .scott extension must be mapped to the ASP.NET engine. The ASP.NET engine, then, will route the request correctly to the appropriate HTTP handler.
For more information on registering HTTP modules and HTTP handlers, be sure to consult the <httpModules> element documentation along with the <httpHandlers> element documentation.

Implementing URL Rewriting
URL rewriting can be implemented either with ISAPI filters at the IIS Web server level, or with either HTTP modules or HTTP handlers at the ASP.NET level. This article focuses on implementing URL rewriting with ASP.NET, so we won't be delving into the specifics of implementing URL rewriting with ISAPI filters. There are, however, numerous third-party ISAPI filters available for URL rewriting, such as:

ISAPI Rewrite
IIS Rewrite
PageXChanger
And many others!
Implementing URL rewriting at the ASP.NET level is possible through the System.Web.HttpContext class's RewritePath() method. The HttpContext class contains HTTP-specific information about a specific HTTP request. With each request received by the ASP.NET engine, an HttpContext instance is created for that request. This class has properties like: Request and Response, which provide access to the incoming request and outgoing response; Application and Session, which provide access to application and session variables; User, which provides information about the authenticated user; and other related properties.

With the Microsoft® .NET Framework Version 1.0, the RewritePath() method accepts a single string, the new path to use. Internally, the HttpContext class's RewritePath(string) method updates the Request object's Path and QueryString properties. In addition to RewritePath(string), the .NET Framework Version 1.1 includes another form of the RewritePath() method, one that accepts three string input parameters. This alternate overloaded form not only sets the Request object's Path and QueryString properties, but also sets internal member variables that are used to compute the Request object's values for its PhysicalPath, PathInfo, and FilePath properties.

To implement URL rewriting in ASP.NET, then, we need to create an HTTP module or HTTP handler that:

Checks the requested path to determine if the URL needs to be rewritten.
Rewrites the path, if needed, by calling the RewritePath() method.
For example, imagine that our website had information each employee, accessible through /info/employee.aspx?empID=employeeID. To make the URLs more "hackable," we might decide to have employee pages accessible by: /people/EmployeeName.aspx. Here is a case where we'd want to use URL rewriting. That is, when the page /people/ScottMitchell.aspx was requested, we'd want to rewrite the URL so that the page /info/employee.aspx?empID=1001 was used instead.

URL Rewriting with HTTP Modules
When performing URL rewriting at the ASP.NET level you can use either an HTTP module or an HTTP handler to perform the rewriting. When using an HTTP module, you must decide at what point during the request's lifecycle to check to see if the URL needs to be rewritten. At first glance, this may seem to be an arbitrary choice, but the decision can impact your application in both significant and subtle ways. The choice of where to perform the rewrite matters because the built-in ASP.NET HTTP modules use the Request object's properties to perform their duties. (Recall that rewriting the path alters the Request object's property values.) These germane built-in HTTP modules and the events they tie into are listed below:

HTTP Module Event Description
FormsAuthenticationModule AuthenticateRequest Determines if the user is authenticated using forms authentication. If not, the user is automatically redirected to the specified logon page.
FileAuthorizationMoudle AuthorizeRequest When using Windows authentication, this HTTP module checks to ensure that the Microsoft® Windows® account has adequate rights for the resource requested.
UrlAuthorizationModule AuthorizeRequest Checks to make sure the requestor can access the specified URL. URL authorization is specified through the <authorization> and <location> elements in the Web.config file.

Recall that the BeginRequest event fires before AuthenticateRequest, which fires before AuthorizeRequest.

One safe place that URL rewriting can be performed is in the BeginRequest event. That means that if the URL needs to be rewritten, it will have done so by the time any of the built-in HTTP modules run. The downside to this approach arises when using forms authentication. If you've used forms authentication before, you know that when the user visits a restricted resource, they are automatically redirected to a specified login page. After successfully logging in, the user is sent back to the page they attempted to access in the first place.

If URL rewriting is performed in the BeginRequest or AuthenticateRequest events, the login page will, when submitted, redirect the user to the rewritten page. That is, imagine that a user types into their browser window, /people/ScottMitchell.aspx, which is rewritten to /info/employee.aspx?empID=1001. If the Web application is configured to use forms authentication, when the user first visits /people/ScottMitchell.aspx, first the URL will be rewritten to /info/employee.aspx?empID=1001; next, the FormsAuthenticationModule will run, redirecting the user to the login page, if needed. The URL the user will be sent to upon successfully logging in, however, will be /info/employee.aspx?empID=1001, since that was the URL of the request when the FormsAuthenticationModule ran.

Similarly, when performing rewriting in the BeginRequest or AuthenticateRequest events, the UrlAuthorizationModule sees the rewritten URL. That means that if you use <location> elements in your Web.config file to specify authorization for specific URLs, you will have to refer to the rewritten URL.

To fix these subtleties, you might decide to perform the URL rewriting in the AuthorizeRequest event. While this approach fixes the URL authorization and forms authentication anomalies, it introduces a new wrinkle: file authorization no longer works. When using Windows authentication, the FileAuthorizationModule checks to make sure that the authenticated user has the appropriate access rights to access the specific ASP.NET page.

Imagine if a set of users does not have Windows-level file access to C:\Inetput\wwwroot\info\employee.aspx; if such users attempt to visit /info/employee.aspx?empID=1001, then they will get an authorization error. However, if we move the URL rewriting to the AuthenticateRequest event, when the FileAuthorizationModule checks the security settings, it still thinks the file being requested is /people/ScottMitchell.aspx, since the URL has yet to be rewritten. Therefore, the file authorization check will pass, allowing this user to view the content of the rewritten URL, /info/employee.aspx?empID=1001.

So, when should URL rewriting be performed in an HTTP module? It depends on what type of authentication you're employing. If you're not using any authentication, then it doesn't matter if URL rewriting happens in BeginRequest, AuthenticateRequest, or AuthorizeRequest. If you are using forms authentication and are not using Windows authentication, place the URL rewriting in the AuthorizeRequest event handler. Finally, if you are using Windows authentication, schedule the URL rewriting during the BeginRequest or AuthenticateRequest events.

URL Rewriting in HTTP Handlers
URL rewriting can also be performed by an HTTP handler or HTTP handler factory. Recall that an HTTP handler is a class responsible for generating the content for a specific type of request; an HTTP handler factory is a class responsible for returning an instance of an HTTP handler that can generate the content for a specific type of request.

In this article we'll look at creating a URL rewriting HTTP handler factory for ASP.NET Web pages. HTTP handler factories must implement the IHttpHandlerFactory interface, which includes a GetHandler() method. After initializing the appropriate HTTP modules, the ASP.NET engine determines what HTTP handler or HTTP handler factory to invoke for the given request. If an HTTP handler factory is to be invoked, the ASP.NET engine calls that HTTP handler factory's GetHandler() method passing in the HttpContext for the Web request, along with some other information. The HTTP handler factory, then, must return an object that implements IHttpHandler that can handle the request.

To perform URL rewriting through an HTTP handler, we can create an HTTP handler factory whose GetHandler() method checks the requested path to determine if it needs to be rewritten. If it does, it can call the passed-in HttpContext object's RewritePath() method, as discussed earlier. Finally, the HTTP handler factory can return the HTTP handler returned by the System.Web.UI.PageParser class's GetCompiledPageInstance() method. (This is the same technique by which the built-in ASP.NET Web page HTTP handler factory, PageHandlerFactory, works.)

Since all of the HTTP modules will have been initialized prior to the custom HTTP handler factory being instantiated, using an HTTP handler factory presents the same challenges when placing the URL rewriting in the latter stages of the events—namely, file authorization will not work. So, if you rely on Windows authentication and file authorization, you will want to use the HTTP module approach for URL rewriting.

Over the next section we'll look at building a reusable URL rewriting engine. Following our examination of the URL rewriting engine—which is available in this article's code download—we'll spend the remaining two sections examining real-world uses of URL rewriting. First we'll look at how to use the URL rewriting engine and look at a simple URL rewriting example. Following that, we'll utilize the power of the rewriting engine's regular expression capabilities to provide truly "hackable" URLs.

Building a URL Rewriting Engine
To help illustrate how to implement URL rewriting in an ASP.NET Web application, I created a URL rewriting engine. This rewriting engine provides the following functionality:

The ASP.NET page developer utilizing the URL rewriting engine can specify the rewriting rules in the Web.config file.
The rewriting rules can use regular expressions to allow for powerful rewriting rules.
URL rewriting can be easily configured to use an HTTP module or an HTTP handler.
In this article we will examine URL rewriting with just the HTTP module. To see how HTTP handlers can be used to perform URL rewriting, consult the code available for download with this article.

Specifying Configuration Information for the URL Rewriting Engine
Let's examine the structure of the rewrite rules in the Web.config file. First, you'll need to indicate in the Web.config file if you want perform URL rewriting with the HTTP module or the HTTP handler. In the download, the Web.config file contains two entries that have been commented out:

Copy Code
<!--
<httpModules>
<add type="URLRewriter.ModuleRewriter, URLRewriter"
name="ModuleRewriter" />
</httpModules>
-->

<!--
<httpHandlers>
<add verb="*" path="*.aspx"
type="URLRewriter.RewriterFactoryHandler, URLRewriter" />
</httpHandlers>
-->

Comment out the <httpModules> entry to use the HTTP module for rewriting; comment out the <httpHandlers> entry instead to use the HTTP handler for rewriting.

In addition to specifying whether the HTTP module or HTTP handler is used for rewriting, the Web.config file contains the rewriting rules. A rewriting rule is composed of two strings: the pattern to look for in the requested URL, and the string to replace the pattern with, if found. This information is expressed in the Web.config file using the following syntax:

Copy Code
<RewriterConfig>
<Rules>
<RewriterRule>
<LookFor>pattern to look for</LookFor>
<SendTo>string to replace pattern with</SendTo>
</RewriterRule>
<RewriterRule>
<LookFor>pattern to look for</LookFor>
<SendTo>string to replace pattern with</SendTo>
</RewriterRule>
...
</Rules>
</RewriterConfig>

Each rewrite rule is expressed by a <RewriterRule> element. The pattern to search for is specified by the <LookFor> element, while the string to replace the found pattern with is entered in the <SentTo> element. These rewrite rules are evaluated from top to bottom. If a match is found, the URL is rewritten and the search through the rewriting rules terminates.

When specifying patterns in the <LookFor> element, realize that regular expressions are used to perform the matching and string replacement. (In a bit we'll look at a real-world example that illustrates how to search for a pattern using regular expressions.) Since the pattern is a regular expression, be sure to escape any characters that are reserved characters in regular expressions. (Some of the regular expression reserved characters include: ., ?, ^, $, and others. These can be escaped by being preceded with a backslash, like \. to match a literal period.)

URL Rewriting with an HTTP Module
Creating an HTTP module is as simple as creating a class that implements the IHttpModule interface. The IHttpModule interface defines two methods:

Init(HttpApplication). This method fires when the HTTP module is initialized. In this method you'll wire up event handlers to the appropriate HttpApplication events.
Dispose(). This method is invoked when the request has completed and been sent back to IIS. Any final cleanup should be performed here.
To facilitate creating an HTTP module for URL rewriting, I started by creating an abstract base class, BaseModuleRewriter. This class implements IHttpModule. In the Init() event, it wires up the HttpApplication's AuthorizeRequest event to the BaseModuleRewriter_AuthorizeRequest method. The BaseModuleRewriter_AuthorizeRequest method calls the class's Rewrite() method passing in the requested Path along with the HttpApplication object that was passed into the Init() method. The Rewrite() method is abstract, meaning that in the BaseModuleRewriter class, the Rewrite() method has no method body; rather, the class being derived from BaseModuleRewriter must override this method and provide a method body.

With this base class in place, all we have to do now is to create a class derived from BaseModuleRewriter that overrides Rewrite() and performs the URL rewriting logic there. The code for BaseModuleRewriter is shown below.

Copy Code
public abstract class BaseModuleRewriter : IHttpModule
{
public virtual void Init(HttpApplication app)
{
// WARNING! This does not work with Windows authentication!
// If you are using Windows authentication,
// change to app.BeginRequest
app.AuthorizeRequest += new
EventHandler(this.BaseModuleRewriter_AuthorizeRequest);
}

public virtual void Dispose() {}

protected virtual void BaseModuleRewriter_AuthorizeRequest(
object sender, EventArgs e)
{
HttpApplication app = (HttpApplication) sender;
Rewrite(app.Request.Path, app);
}

protected abstract void Rewrite(string requestedPath,
HttpApplication app);
}

Notice that the BaseModuleRewriter class performs URL rewriting in the AuthorizeRequest event. Recall that if you use Windows authentication with file authorization, you will need to change this so that URL rewriting is performed in either the BeginRequest or AuthenticateRequest events.

The ModuleRewriter class extends the BaseModuleRewriter class and is responsible for performing the actual URL rewriting. ModuleRewriter contains a single overridden method—Rewrite()—which is shown below:

Copy Code
protected override void Rewrite(string requestedPath,
System.Web.HttpApplication app)
{
// get the configuration rules
RewriterRuleCollection rules =
RewriterConfiguration.GetConfig().Rules;

// iterate through each rule...
for(int i = 0; i < rules.Count; i++)
{
// get the pattern to look for, and
// Resolve the Url (convert ~ into the appropriate directory)
string lookFor = "^" +
RewriterUtils.ResolveUrl(app.Context.Request.ApplicationPath,
rules[i].LookFor) + "$";

// Create a regex (note that IgnoreCase is set...)
Regex re = new Regex(lookFor, RegexOptions.IgnoreCase);

// See if a match is found
if (re.IsMatch(requestedPath))
{
// match found - do any replacement needed
string sendToUrl =
RewriterUtils.ResolveUrl(app.Context.Request.ApplicationPath,
re.Replace(requestedPath, rules[i].SendTo));

// Rewrite the URL
RewriterUtils.RewriteUrl(app.Context, sendToUrl);
break; // exit the for loop
}
}
}

The Rewrite() method starts with getting the set of rewriting rules from the Web.config file. It then iterates through the rewrite rules one at a time, and for each rule, it grabs its LookFor property and uses a regular expression to determine if a match is found in the requested URL.

If a match is found, a regular expression replace is performed on the requested path with the value of the SendTo property. This replaced URL is then passed into the RewriterUtils.RewriteUrl() method. RewriterUtils is a helper class that provides a couple of static methods used by both the URL rewriting HTTP module and HTTP handler. The RewriterUrl() method simply calls the HttpContext object's RewriteUrl() method.

Note You may have noticed that when performing the regular expression match and replacement, a call to RewriterUtils.ResolveUrl() is made. This helper method simply replaces any instances of ~ in the string with the value of the application's path.
The entire code for the URL rewriting engine is available for download with this article. We've examined the most germane pieces, but there are other components as well, such as classes for deserializing the XML-formatted rewriting rules in the Web.config file into an object, as well as the HTTP handler factory for URL rewriting. The remaining three sections of this article examine real-world uses of URL rewriting.

Performing Simple URL Rewriting with the URL Rewriting Engine
To demonstrate the URL rewriting engine in action, let's build an ASP.NET Web application that utilizes simple URL rewriting. Imagine that we work for a company that sells assorted products online. These products are broken down into the following categories:

Category ID Category Name
1 Beverages
2 Condiments
3 Confections
4 Dairy Products
... ...

Assume we already have created an ASP.NET Web page called ListProductsByCategory.aspx that accepts a Category ID value in the querystring and displays all of the products belonging to that category. So, users who wanted to view our Beverages for sale would visit ListProductsByCategory.aspx?CategoryID=1, while users who wanted to view our Dairy Products would visit ListProductsByCategory.aspx?CategoryID=4. Also assume we have a page called ListCategories.aspx, which lists the categories of products for sale.

Clearly this is a case for URL rewriting, as the URLs a user is presented with do not carry any significance for the user, nor do they provide any "hackability." Rather, let's employ URL rewriting so that when a user visits /Products/Beverages.aspx, their URL will be rewritten to ListProductsByCategory.aspx?CategoryID=1. We can accomplish this with the following URL rewriting rule in the Web.config file:

Copy Code
<RewriterConfig>
<Rules>
<!-- Rules for Product Lister -->
<RewriterRule>
<LookFor>~/Products/Beverages\.aspx</LookFor>
<SendTo>~/ListProductsByCategory.aspx?CategoryID=1</SendTo>
</RewriterRule>
<RewriterRule>
</Rules>
</RewriterConfig>

As you can see, this rule searches to see if the path requested by the user was /Products/Beverages.aspx. If it was, it rewrites the URL as /ListProductsByCategory.aspx?CategoryID=1.

Note Notice that the <LookFor> element escapes the period in Beverages.aspx. This is because the <LookFor> value is used in a regular expression pattern, and period is a special character in regular expressions meaning "match any character," meaning a URL of /Products/BeveragesQaspx, for example, would match. By escaping the period (using \.) we are indicating that we want to match a literal period, and not any old character.
With this rule in place, when a user visits /Products/Beverages.aspx, they will be shown the beverages for sale. Figure 3 shows a screenshot of a browser visiting /Products/Beverages.aspx. Notice that in the browser's Address bar the URL reads /Products/Beverages.aspx, but the user is actually seeing the contents of ListProductsByCategory.aspx?CategoryID=1. (In fact, there doesn't even exist a /Products/Beverages.aspx file on the Web server at all!)

Similar to /Products/Beverages.aspx, we'd next add rewriting rules for the other product categories. This simply involves adding additional <RewriterRule> elements within the <Rules> element in the Web.config file. Consult the Web.config file in the download for the complete set of rewriting rules for the demo.

To make the URL more "hackable," it would be nice if a user could simply hack off the Beverages.aspx from /Products/Beverages.aspx and be shown a listing of the product categories. At first glance, this may appear a trivial task—just add a rewriting rule that maps /Products/ to /ListCategories.aspx. However, there is a fine subtlety—you must first create a /Products/ directory and add an empty Default.aspx file in the /Products/ directory.

To understand why these extra steps need to be performed, recall that the URL rewriting engine is at the ASP.NET level. That is, if the ASP.NET engine is never given the opportunity to process the request, there's no way the URL rewriting engine can inspect the incoming URL. Furthermore, remember that IIS hands off incoming requests to the ASP.NET engine only if the requested file has an appropriate extension. So if a user visits /Products/, IIS doesn't see any file extension, so it checks the directory to see if there exists a file with one of the default filenames. (Default.aspx, Default.htm, Default.asp, and so on. These default filenames are defined in the Documents tab of the Web Server Properties dialog box in the IIS Administration dialog box.) Of course, if the /Products/ directory doesn't exist, IIS will return an HTTP 404 error.

So, we need to create the /Products/ directory. Additionally, we need to create a single file in this directory, Default.aspx. This way, when a user visits /Products/, IIS will inspect the directory, see that there exists a file named Default.aspx, and then hand off processing to the ASP.NET engine. Our URL rewriter, then, will get a crack at rewriting the URL.

After creating the directory and Default.aspx file, go ahead and add the following rewriting rule to the <Rules> element:

Copy Code
<RewriterRule>
<LookFor>~/Products/Default\.aspx</LookFor>
<SendTo>~/ListCategories.aspx</SendTo>
</RewriterRule>

Handling Postbacks
If the URLs you are rewriting contain a server-side Web Form and perform postbacks, when the form posts back, the underlying URL will be used. That is, if our user enters into their browser, /Products/Beverages.aspx, they will still see in their browser's Address bar, /Products/Beverages.aspx, but they will be shown the content for ListProductsByCategory.aspx?CategoryID=1. If ListProductsByCategory.aspx performs a postback, the user will be posted back to ListProductsByCategory.aspx?CategoryID=1, not /Products/Beverages.aspx. This won't break anything, but it can be disconcerting from the user's perspective to see the URL change suddenly upon clicking a button.

The reason this behavior happens is because when the Web Form is rendered, it explicitly sets its action attribute to the value of the file path in the Request object. Of course, by the time the Web Form is rendered, the URL has been rewritten from /Products/Beverages.aspx to ListProductsByCategory.aspx?CategoryID=1, meaning the Request object is reporting that the user is visiting ListProductsByCategory.aspx?CategoryID=1. This problem can be fixed by having the server-side form simply not render an action attribute. (Browsers, by default, will postback if the form doesn't contain an action attribute.)

Unfortunately, the Web Form does not allow you to explicitly specify an action attribute, nor does it allow you to set some property to disable the rendering of the action attribute. Rather, we'll have to extend the System.Web.HtmlControls.HtmlForm class ourselves, overriding the RenderAttribute() method and explicitly indicating that it not render the action attribute.

Thanks to the power of inheritance, we can gain all of the functionality of the HtmlForm class and only have to add a scant few lines of code to achieve the desired behavior. The complete code for the custom class is shown below:

Copy Code
namespace ActionlessForm {
public class Form : System.Web.UI.HtmlControls.HtmlForm
{
protected override void RenderAttributes(HtmlTextWriter writer)
{
writer.WriteAttribute("name", this.Name);
base.Attributes.Remove("name");

writer.WriteAttribute("method", this.Method);
base.Attributes.Remove("method");

this.Attributes.Render(writer);

base.Attributes.Remove("action");

if (base.ID != null)
writer.WriteAttribute("id", base.ClientID);
}
}
}

The code for the overridden RenderAttributes() method simply contains the exact code from the HtmlForm class's RenderAttributes() method, but without setting the action attribute. (I used Lutz Roeder's Reflector to view the source code of the HtmlForm class.)

Once you have created this class and compiled it, to use it in an ASP.NET Web application, start by adding it to the Web application's References folder. Then, to use it in place of the HtmlForm class, simply add the following to the top of your ASP.NET Web page:

Copy Code
<%@ Register TagPrefix="skm" Namespace="ActionlessForm"
Assembly="ActionlessForm" %>

Then, where you have <form runat="server">, replace that with:

Copy Code
<skm:Form id="Form1" method="post" runat="server">

and replace the closing </form> tag with:

Copy Code
</skm:Form>

You can see this custom Web Form class in action in ListProductsByCategory.aspx, which is included in this article's download. Also included in the download is a Visual Studio .NET project for the action-less Web Form.

Note If the URL you are rewriting to does not perform a postback, there's no need to use this custom Web Form class.
Creating Truly "Hackable" URLs
The simple URL rewriting demonstrated in the previous section showed how easily the URL rewriting engine can be configured with new rewriting rules. The true power of the rewriting rules, though, shines when using regular expressions, as we'll see in this section.

Blogs are becoming more and more popular these days, and it seems everyone has their own blog. If you are not familiar with blogs, they are often-updated personal pages that typically serve as an online journal. Most bloggers simply write about their day-to-day happenings, others focus on blogging about a specific theme, such as movie reviews, a sports team, or a computer technology.

Depending on the author, blogs are updated anywhere from several times a day to once every week or two. Typically the blog homepage shows the most recent 10 entries, but virtually all blogging software provides an archive through which visitors can read older postings. Blogs are a great application for "hackable" URLs. Imagine while searching through the archives of a blog you found yourself at the URL /2004/02/14.aspx. Would you be terribly surprised if you found yourself reading the posts made on February 14th, 2004? Furthermore, you might want to view all posts for February 2004, in which case you might try hacking the URL to /2004/02/. To view all 2004 posts, you might try visiting /2004/.

When maintaining a blog, it would be nice to provide this level of URL "hackability" to your visitors. While many blog engines provide this functionality, let's look at how it can be accomplished using URL rewriting.

First, we need a single ASP.NET Web page that will show blog entries by day, month, or year. Assume we have such a page, ShowBlogContent.aspx, that takes in querystring parameters year, month, and day. To view the posts for February 14th, 2004, we could visit ShowBlogContent.aspx?year=2004&month=2&day=14. To view all posts for February 2004, we'd visit ShowBlogContent.aspx?year=2004&month=2. Finally, to see all posts for the year 2004, we'd navigate to ShowBlogContent.aspx?year=2004. (The code for ShowBlogContent.aspx can be found in this article's download.)

So, if a user visits /2004/02/14.aspx, we need to rewrite the URL to ShowBlogContent.aspx?year=2004&month=2&day=14. All three cases—when the URL specifies a year, month, and day; when the URL specifies just the year and month; and when the URL specifies only the yea—can be handled with three rewrite rules:

Copy Code
<RewriterConfig>
<Rules>
<!-- Rules for Blog Content Displayer -->
<RewriterRule>
<LookFor>~/(\d{4})/(\d{2})/(\d{2})\.aspx</LookFor>
<SendTo>~/ShowBlogContent.aspx?year=$1&amp;month=$2&amp;day=$3</SendTo>
</RewriterRule>
<RewriterRule>
<LookFor>~/(\d{4})/(\d{2})/Default\.aspx</LookFor>
<SendTo><![CDATA[~/ShowBlogContent.aspx?year=$1&month=$2]]></SendTo>
</RewriterRule>
<RewriterRule>
<LookFor>~/(\d{4})/Default\.aspx</LookFor>
<SendTo>~/ShowBlogContent.aspx?year=$1</SendTo>
</RewriterRule>
</Rules>
</RewriterConfig>

These rewriting rules demonstrate the power of regular expressions. In the first rule, we look for a URL with the pattern (\d{4})/(\d{2})/(\d{2})\.aspx. In plain English, this matches a string that has four digits followed by a forward slash followed by two digits followed by a forward slash, followed by two digits followed by .aspx. The parenthesis around each digit grouping is vital—it allows us to refer to the matched characters inside those parentheses in the corresponding <SendTo> property. Specifically, we can refer back to the matched parenthetical groupings using $1, $2, and $3 for the first, second, and third parenthesis grouping, respectively.

Note Since the Web.config file is XML-formatted, characters like &, <, and > in the text portion of an element must be escaped. In the first rule's <SendTo> element, & is escaped to &amp;. In the second rule's <SendTo>, an alternative technique is used—by using a <![CDATA[...]]> element, the contents inside do not need to be escaped. Either approach is acceptable and accomplishes the same end.
Figures 5, 6, and 7 show the URL rewriting in action. The data is actually being pulled from my blog, http://ScottOnWriting.NET. In Figure 5, the posts for November 7, 2003 are shown; in Figure 6 all posts for November 2003 are shown; Figure 7 shows all posts for 2003.

Note The URL rewriting engine expects a regular expression pattern in the <LookFor> elements. If you are unfamiliar with regular expressions, consider reading an earlier article of mine, An Introduction to Regular Expressions. Also, a great place to get your hands on commonly used regular expressions, as well as a repository for sharing your own crafted regular expressions, is RegExLib.com.
Building the Requisite Directory Structure
When a request comes in for /2004/03/19.aspx, IIS notes the .aspx extension and routes the request to the ASP.NET engine. As the request moves through the ASP.NET engine's pipeline, the URL will get rewritten to ShowBlogContent.aspx?year=2004&month=03&day=19 and the visitor will see those blog entries for March 19, 2004. But what happens when the user navigates to /2004/03/? Unless there is a directory /2004/03/, IIS will return a 404 error. Furthermore, there needs to be a Default.aspx page in this directory so that the request is handed off to the ASP.NET engine.

So with this approach, you have to manually create a directory for each year in which there are blog entries, with a Default.aspx page in the directory. Additionally, in each year directory you need to manually create twelve more directories—01, 02, …, 12—each with a Default.aspx file. (Recall that we had to do the same thing—add a /Products/ directory with a Default.aspx file—in the previous demo so that visiting /Products/ correctly displayed ListCategories.aspx.)

Clearly, adding such a directory structure can be a pain. A workaround to this problem is to have all incoming IIS requests map to the ASP.NET engine. This way, even if when visiting the URL /2004/03/, IIS will faithfully hand off the request to the ASP.NET engine even if there does not exist a /2004/03/ directory. Using this approach, however, makes the ASP.NET engine responsible for handling all types of incoming requests to the Web server, including images, CSS files, external JavaScript files, Macromedia Flash files, and so on.

A thorough discussion of handling all file types is far beyond the scope of this article. For an example of an ASP.NET Web application that uses this technique, though, look into .Text, an open-source blog engine. .Text can be configured to have all requests mapped to the ASP.NET engine. It can handle serving all file types by using a custom HTTP handler that knows how to serve up typical static file types (images, CSS files, and so on).

Conclusion
In this article we examined how to perform URL rewriting at the ASP.NET-level through the HttpContext class's RewriteUrl() method. As we saw, RewriteUrl() updates the particular HttpContext's Request property, updating what file and path is being requested. The net effect is that, from the user's perspective, they are visiting a particular URL, but actually a different URL is being requested on the Web server side.

URLs can be rewritten either in an HTTP module or an HTTP handler. In this article we examined using an HTTP module to perform the rewriting, and looked at the consequences of performing the rewriting at different stages in the pipeline.

Of course, with ASP.NET-level rewriting, the URL rewriting can only happen if the request is successfully handed off from IIS to the ASP.NET engine. This naturally occurs when the user requests a page with a .aspx extension. However, if you want the person to be able to enter a URL that might not actually exist, but would rather rewrite to an existing ASP.NET page, you have to either create mock directories and Default.aspx pages, or configure IIS so that all incoming requests are blindly routed to the ASP.NET engine.

Related Books
ASP.NET: Tips, Tutorials, and Code

Microsoft ASP.NET Coding Strategies with the Microsoft ASP.NET Team

Essential ASP.NET with Examples in C#

Works consulted
URL rewriting is a topic that has received a lot of attention both for ASP.NET and competing server-side Web technologies. The Apache Web server, for instance, provides a module for URL rewriting called mod_rewrite. mod_rewrite is a robust rewriting engine, providing rewriting rules based on conditions such as HTTP headers and server variables, as well as rewriting rules that utilize regular expressions. For more information on mod_rewrite, check out A User's Guide to URL Rewriting with the Apache Web Server.

There are a number of articles on URL rewriting with ASP.NET. Rewrite.NET - A URL Rewriting Engine for .NET examines creating a URL rewriting engine that mimics mod_rewrite's regular expression rules. URL Rewriting With ASP.NET also gives a good overview of ASP.NET's URL rewriting capabilities. Ian Griffiths has a blog entry on some of the caveats associated with URL rewriting with ASP.NET, such as the postback issue discussed in this article. Both Fabrice Marguerie (read more) and Jason Salas (read more) have blog entires on using URL rewriting to boost search engine placement.