[ About | Download | UserGuide | FAQ ]
The name 'Lancelot' comes from the 1970's TV show 'Lancelot Link', where a chimpanzee plays the part of a spy. Other Java classes I wrote, 'Mata' & 'Darwin', are names of other characters on the show. The similarites between this project & the show led me to the name:
Ok, the last comparison is not fair, I did learn there are some tricky aspects to it - Lancelot is far from shrink-wrap-ready!
Lancelot makes the following simplifying assumptions:
Lancelot is freeware. It can be downloaded via a single zip file, or as 4 separate files.
Note that you must already have a Java interpreter to run Lancelot! If you don't have, get Sun's Java Runtime Environment.
Lancelot is simple to run, though I'll admit its output is a little cryptic. The command line is:
[JavaInterpreter] Lancelot [URL of page you want to verify]
For example, if your interpreter was 'jview', and you wanted to verify my home page, you'd enter:
jview Lancelot http://www.krahe.org/chris/
Here are a few lines of sample output from this example:
Connected. Verifying links in http://www.krahe.org/chris/... Line:44 Status:404 Not Found shqlServer.htm Line:69 Status:302 Found http://arch-http.hq.eso.org/sybase/faq/ New Location: http://reality.sgi.com/pablo/Sybase_FAQ/ Line:86 Status:404 Not Found http://ftp.ameritel.net/mirrors/tucows/tindex.html IOException:java.net.SocketException: connect Line:147 Status:0 null http://www.motorcity.com/site/MC/PPHill/PPHillHome.html
Notes from this example:
>How do you plan on handling intranet links, for example:
http://w3.mbi.com/
LancelotCGI cannot verify these links, but Lancelot (command line) can when run on your
machine. This is cuz LancelotCGI runs on apl.jhu.edu (ie. the outside world) and is not
permitted to connect to your private MBI web sites.
>Chris - got this response from your link verifier. What gives?
>Line:53 Status:302 Found
>http://eeunix.ee.usm.maine.edu/gorham
>New Location: http://eeunix.ee.usm.maine.edu/gorham/
According to http specs, the first URL is invalid. Unless it specifies a specific file,
eg: "http://eeunix.ee.usm.maine.edu/gorham/whatever.html", it must end in a
slash "/". Therefore, the second URL is the correct one.
So, when you ask the web server at eeunix.ee.usm.maine.edu for "gorham", it
replies with a status value of 302, meaning "redirection", and returns the new
location. You then make a 2nd request using this new URL to request the file you wanted in
the first place.
So how come clicking on the non-slashed link still works in your browser? Well 302 status
can also mean a file REALLY moved, possibly to a completely different server. So to be
nice, the browser developers detect this 302 status and make the 2nd request for you
behind-the-scenes.
So is it worth fixing these types of links? Sure! Your web page users will get to their
destination quicker - no more double round-trips each time!
1994-2003 krahe.org/chris | chris@krahe.org |