Of course, by the end of the day there was a dozen ways around it: Google translate, proxy's etc.
So I was thinking of a way to write a script that there was no way to get around that stops people going on facebook. While this is probably impossible, I think I may have a way to stop 99% or more ways to get there:
Pretty much don't look at the URL, or the IP, but look at the html itself.
I was poking around, looking at the source of a facebook page to see if there was anything distinctive there and was pleasently surprised to find the following:
Looking at another dozen or so pages all turned up the same sort of thing.<meta property="og:site_name" content="Facebook" />
Right, so every facebook page has that little string on line #3. (According to the data I have gathered)
So to detect if a page is facebook you could write a script that scans the html for [content="facebook"] or something similar
So time to test my theory. Go onto a proxy website and see if it is still there.
Test 1:
http://www.anonymouse.org
Yup, sucess:
content="Facebook helps you connect and share
Test 2:
google translate
Yup. Or, maybe:
was the closest I could find. Still not likely to turn up in a normal website though. (links to facebook maybe though)
Test 3:
unblockthat.us
Yup
" Looking through here, and the ones previously tested had this snippet occurint in all of them. With the deviations being [id=facebook] or [id=facebook_logo]id="facebook
So I kept testing for a while and came to the conclusions that most (if not all) proxy websites still have [id=facebook] or [content=facebook] in some shape or form.
If you are interested the proxy services I used for testing were from this page.
So if it is this simple to identify facebook. Why hasn't anyone written a blocking program that searches the html (specifically the header) of the page for a particular chunk of text (I imagine youtube and such would be similar, with a distinctive mark).
Is there something that I've missed that would make it impracticable?
(Oh, and this page has probably now been blocked by the school firewall for exceeding the "weighted phrase limit" to try and stop people googling ways around it)