The Register: Demon ends porn-less Internet Archive block

The Register: Demon ends porn-less Internet Archive block

Be, Virgin – collateral damage

By Cade Metz in San Francisco
Posted in Telecoms, 16th January 2009 19:34 GMT

British ISP Demon Internet is no longer blocking access to the Internet Archive’s Wayback Machine, after working in tandem with the IA to correct a ‘technical issue’ with its child-pornography filter.

Earlier this week, multiple Demon customers complained they were unable to access the Wayback Machine, an 85-billion-page web history dating back to 1996. Attempts to retrieve archived webpages were met with error pages whose urls pointed to Demon’s child porn filter, based on a blacklist compiled by the not-for-profit Internet Watch Foundation.

The IWF soon confirmed that its blacklist contains at least one image hosted by the Wayback Machine. But although IWF filters are typically designed to block individual pages, Demon’s filter seemed to be blocking the entire archive.

In a statement tossed our way, Thus Cable & Wireless – the owners of Demon Internet – now say they have resolved the problem. ‘We will continue to work closely with the IWF and others to ensure the safety and security of all web users and address any technical issues, should they arise, in order to deliver the best service to our customers,’ the statement reads. ‘In this instance, the technical issue, an obscure software bug brought to light by the interaction of our filtering technology and the Internet Archive’s servers, has been identified and resolved.’

The company did not elaborate, but a senior engineer with the company has provided an explanation on a newsgroup where users have discussed the blocking. According to this post, Demon customers were unable to access large parts of the Wayback Machine because of the way Demon’s IWF filter interacted with the web cache used by the IA to speed access.

Because at least one Internet Archive page is blacklisted by the IWF, Demon uses a proxy server each time a user requests info from the IA’s servers. If a user requested a page that had not been cached by the IA, Demon’s proxy had a way of mucking with the caching process. When creating a url for the cached page, IA servers were inserting the proxy’s name: iwfwebfilter.thus.net.

This created cache urls that did not point to webpages. And so, more often than not, Demon customers received error pages when attempting to access the Wayback Machine. And because the bogus urls remained in the IA cache, it meant that error pages appeared when surfers on other ISPs attempted to access the same content.

Which explains why some Be Unlimited and Virgin Media customers were having problems with the Wayback Machine.

‘A page with the iwfwebfilter.thus.net URLs could be cached and then served up to non-Demon customers, which explains…other reports of people who’d not been anywhere near the Demon caches seeing ‘iwfwebfilter.thus.net’ where they’d been expecting ‘web.archive.org,” reads the post from that Demon engineer.

Be and Virgin have both told The Reg that their IWF filters have not causing problems with Wayback access. The Internet Archive has not responded to requests for comment.

Last month, IWF-based filters created a similar problem with Wikipedia, a free encyclopedia/online cult. But this issue was resolved when the IWF decided to remove a controversial Wikipedia url from its blacklist. ®