Krahe family website home
photos cars software fun

Lancelot - An HTML link-verifier written in Java

[ About | Download | UserGuide | FAQ ]


About Lancelot

The name 'Lancelot' comes from the 1970's TV show 'Lancelot Link', where a chimpanzee plays the part of a spy. Other Java classes I wrote, 'Mata' & 'Darwin', are names of other characters on the show. The similarites between this project & the show led me to the name:

Ok, the last comparison is not fair, I did learn there are some tricky aspects to it - Lancelot is far from shrink-wrap-ready!

Lancelot makes the following simplifying assumptions:

Back to top


Download Lancelot

Lancelot is freeware. It can be downloaded via a single zip file, or as 4 separate files.

Single zip file

Separate files

Note that you must already have a Java interpreter to run Lancelot! If you don't have, get Sun's Java Runtime Environment.

Back to top


Lancelot User Guide

Lancelot is simple to run, though I'll admit its output is a little cryptic. The command line is:

[JavaInterpreter] Lancelot [URL of page you want to verify]

For example, if your interpreter was 'jview', and you wanted to verify my home page, you'd enter:

jview Lancelot http://www.krahe.org/chris/

Here are a few lines of sample output from this example:

Connected.  Verifying links in http://www.krahe.org/chris/...

Line:44 Status:404 Not Found
shqlServer.htm

Line:69 Status:302 Found
http://arch-http.hq.eso.org/sybase/faq/
New Location: http://reality.sgi.com/pablo/Sybase_FAQ/

Line:86 Status:404 Not Found
http://ftp.ameritel.net/mirrors/tucows/tindex.html

IOException:java.net.SocketException: connect
Line:147 Status:0 null
http://www.motorcity.com/site/MC/PPHill/PPHillHome.html

Notes from this example:

Back to top


Frequently-Asked Questions

>How do you plan on handling intranet links, for example: http://w3.mbi.com/

LancelotCGI cannot verify these links, but Lancelot (command line) can when run on your machine. This is cuz LancelotCGI runs on apl.jhu.edu (ie. the outside world) and is not permitted to connect to your private MBI web sites.

>Chris - got this response from your link verifier. What gives?
>Line:53 Status:302 Found
>http://eeunix.ee.usm.maine.edu/gorham
>New Location: http://eeunix.ee.usm.maine.edu/gorham/

According to http specs, the first URL is invalid. Unless it specifies a specific file, eg: "http://eeunix.ee.usm.maine.edu/gorham/whatever.html", it must end in a slash "/". Therefore, the second URL is the correct one.

So, when you ask the web server at eeunix.ee.usm.maine.edu for "gorham", it replies with a status value of 302, meaning "redirection", and returns the new location. You then make a 2nd request using this new URL to request the file you wanted in the first place.

So how come clicking on the non-slashed link still works in your browser? Well 302 status can also mean a file REALLY moved, possibly to a completely different server. So to be nice, the browser developers detect this 302 status and make the 2nd request for you behind-the-scenes.

So is it worth fixing these types of links? Sure! Your web page users will get to their destination quicker - no more double round-trips each time!


Back to top

 


1994-2003 krahe.org/chris chris@krahe.org