Found at: http://publish.ez.no/article/articleprint/31/

Missing Documents: How to make 404 messages friendlier with Apache



When someone comes to a 404 document, especially the unfriendly Apache version, they might give up trying to visit you. Therefore it is paramount that you have some measures at hand.


How not to do it.
"In 1999, the incidence of dead link sites was 5.7%. In round numbers that means for every 19 links you click online, one won't work. The 1999 prevalence was 28.5% -- that is just under one in three web pages contained a dead link." All Things Web, Terry Sullivan

That's up 10% since 1998. Or 5% every year, which, if it holds true still, means 33% of all web pages today have one or more broken links.

The first thing you should do is configure Apache to serve a customized 404 document. This is done either in the server configuration, the virtual host, PR directory or in the .htaccess file. What you need to do is add the following directive into the selected configuration:

ErrorDocument 404 URL

The URL can be "http://somehost.somewhere.com/", "/path/to/document.html", "/path/to/script.php" or "Plain text". You should use a path, since the error code will be propagated to the client, whomever that might be. If you use an URL a redirect will be sent, and that isn't the message you would like to send people.

You can send other error codes as well, 401 "Authorization required", for example, look up more information about HTTP error codes, client errors and server errors. A 401 should never be redirected because the client will not know that it should present the user with an authorization box.

I suggest you create a php file (or perl file) which will handle your 404 errors. That script should incorporate the same look and feel of your site, but it should also inform the user about the problem and present the user with solutions, which is what I will concentrate on.


Known Problems

There might be sites linking to a page on your site which doesn't exist. Perhaps they spelled wrong, perhaps you've moved the page. Either way, if you can't get them to change the link you should make a redirect to the current, or the correct document, thus you can save readers any pain.

Common Mistakes

Tell the user which mistakes are commonly made when writing URLs. For example, Apache is case-sensitive and MS influenced people might write .htm instead of .html.

Search

If the user comes to a page which doesn't exist, give him the opportunity to search for the info at your site.

Path Unwinding

Your script should first unwind the path to the requested document. Start at the end, and remove each level in the path. Check the new path to see if it exists and store it if it exists (use http head, for example), along with the title of the document and other information (keywords or description). Do this for each level until the root.

Finally present the user with the result and tell him to try any of the returned results. Suggest a search with the different parts of the URL, by providing a direct link to the search engine with the proper search criteria.

Spelling

While doing path unwinding you can submit each level of the path to a spell checker, or to the speling_module of Apache. If a spell check somewhere in the path is performed, test the whole path. If you find that the whole path is correct, redirect the user to the correct page. Otherwise, store the spelling fix along with your other information in order to give the user better help.

Most Popular Sections

If you know that some parts of your site is very popular you should present those to the user. It might be exactly what he was looking for.

Standard Features

You should also tell the user where he is, ie. that your site is of this or that kind. Continue with offering the user options, try out our discussion forum, read your news, visit our links, give us feedback about the problem, etc. Try using the path provided as keywords for searches in your forum, perhaps you have a forum discussing the thing searched for?

Information Overload

You want the reader to find something at your site which will keep him there. Be careful though, don't overload him, if it turns out that you have 50 solutions, just try out some broad ones on the user, and present him with an option for reading all suggestions you have.

Logging

Finally, a scripted 404 can be used to perform more accurate logging of missing pages. This is especially valid if you don't have access to the error logs of your site. You should take note of where the user came from (in order to check if someone is providing incorrect links), where the user went from the error page (this is achieved by sending the user through a special redirect link where you log the redirection), ie. did you provide a good service, and what the user did try to reach.

All this information is invaluable for you when you later go through your logs to find out which problems your site has.


Conclusion


How you can do it.
A 404 doesn't need to be an sore in the eye of the user. It might be a way to catch new users for your site by providing solutions to the problem.

Good Examples

At useit.com you can see an example of the error page generated there. It lists common mistakes, the site's most popular pages, search, and leads the user to the main page.

At Lycos you can see an example of the error page generated there. It refers the user to the Tech Glossary which is where the linkrot entry exists.

Bad Examples


How not to do it.
At cnn.com you can see how not to do things.

At Goole, which I use for my searches, they're not good either.

Read More

Cool URIs don't change a page at w3.org, the www consortium.
Alertbox: Linkrot an article from useit.com, Jakob Nielsen's page.
Web Pages Must Live Forever an article from useit.com, Jakob Nielsen's page.
The Biggest Problem with the Web: The 404 Error an article from John Dvorak's column at ZDNet.
Does Your Site Suffer from Linkrot? an article from Enterprise Watch.


| Back to normal page view |