This is a C#/ASP.NET article. However, the technique discribed is very simple and can be used in any programming language.
To see an illustration of what the code below is trying to do, click this link http://validator.w3.org/check/referer. If your browser is sending the referrer correctly, you will see a validation summary of this very page - about 214 validation errors, I know :).
Now the rest of this article describes how to do this request programmatically.
So if you need a quick and dirty way of validating your website for XHTML or HTML compliance, you can use this simple function. What it does is do a request to the validator at W3.org, decide whether validation was successful or not (returns true of false), and also returns the entire detailed validation output returned by the validator (the validation output is HTML, as the validator is an online tool designed to be used manually from the browser).
The good: No need to install any libraries, just a few lines of code.
The bad: the URL you are trying to validate must be online (ie. the validator must be able to reach it - so http://www.yourwebsite.org works, http://localhost/ won't work).
{
string validatorUri =
"http://validator.w3.org/check/referer";
WebClient wc =
new WebClient();
wc.Headers[HttpRequestHeader.Referer]
= uri;
Stream response =
wc.OpenRead(validatorUri);
using (StreamReader
sr = new
StreamReader(response))
{
html = sr.ReadToEnd();
if (html.IndexOf("class=\"valid\"") >= 0)
{
return true;
}
else if
(html.IndexOf("class=\"invalid\"") >= 0)
{
return
false;
}
else
{
throw new
ApplicationException("Validator has returned invalid response");
}
}
}
You can also create a website that validates itself. Create the following class called AutoValidatingPage, and make sure all your website pages extend this rather than the System.Web.UI.Page type.
It works like this. Notice the variable validateProbability. What it meas is that with the probability of 1 in a 1000, the page will validate itself (the validation function is identical to the one listed above). So let's say each of your webpages is visited 1000 times a month. This means each page will validate itself about once a month. You can now add a new page to the website, or make some changes to an existing page without worry. If you add anything that's invalid, you will find out pretty soon.
using System;
using System.Collections.Generic;
using System.Text;
using System.Web.UI;
using System.Web.UI.HtmlControls;
using System.Net;
using System.IO;
using System.Net.Mail;
namespace MyWeb
{
public class
AutoValidatingPage : Page
{
static Random
random = new Random();
//note: the validator will do a new request
to the current url to validate it
//so if you set this to 1 (ie. Validate on
every request), it will cause
//an infinite loop of validations
//therefore, this should be a very small
value
private double
validateProbability = 0.001;
protected
override void Render(HtmlTextWriter writer)
{
base.Render(writer);
if (validateProbability != 0)
{
int randLimit = (int)Math.Round(1 / validateProbability);
if (random.Next(randLimit) == 0)
{
ValidateXhtml();
}
}
}
protected void
ValidateXhtml()
{
string html;
string uri =
"http://" + Request.Url.Host + Request.RawUrl;
if (!XhtmlPage.ValidateXhtml(uri,
out html))
{
MailMessage mail =
new MailMessage();
mail.Subject = "500 Invalid Xhtml";
mail.IsBodyHtml = true;
mail.Body = html;
//todo: send the email here, or log to a log file instead
}
}
static bool
ValidateXhtml(string uri,
out string html)
{
string validatorUri =
"http://validator.w3.org/check/referer";
WebClient wc =
new WebClient();
wc.Headers[HttpRequestHeader.Referer]
= uri;
Stream response =
wc.OpenRead(validatorUri);
using (StreamReader
sr = new
StreamReader(response))
{
html = sr.ReadToEnd();
if (html.IndexOf("class=\"valid\"") >= 0)
{
return true;
}
else if
(html.IndexOf("class=\"invalid\"") >= 0)
{
return
false;
}
else
{
throw new
ApplicationException("Validator has returned invalid response");
}
}
}
}
}
Note: I first wrote this in 2007. Last time I've checked was in June 2010, and the above function still works.
Disclaimer: I haven't checked the Terms of Use for the w3.org validator to see whether it allows automated requests or not. So if you plan to use this in any large-scale or commercial application, please make sure you comply with whatever their terms and conditions say.