Website Blocker idea. Needing reality check.....

For system help, all hardware / software topics NOTE: use Coders Corner for all coders topics.

Moderators: Krom, Grendel

Post Reply
User avatar
sdfgeoff
DBB Ace
DBB Ace
Posts: 498
Joined: Wed Jan 12, 2011 1:07 am
Location: Low Earth Orbit
Contact:

Website Blocker idea. Needing reality check.....

Post by sdfgeoff »

Our School recently introduced an IP blocker to stop people going on youtube and facebook and so on.
Of course, by the end of the day there was a dozen ways around it: Google translate, proxy's etc.

So I was thinking of a way to write a script that there was no way to get around that stops people going on facebook. While this is probably impossible, I think I may have a way to stop 99% or more ways to get there:

Pretty much don't look at the URL, or the IP, but look at the html itself.

I was poking around, looking at the source of a facebook page to see if there was anything distinctive there and was pleasently surprised to find the following:
<meta property="og:site_name" content="Facebook" />
Looking at another dozen or so pages all turned up the same sort of thing.

Right, so every facebook page has that little string on line #3. (According to the data I have gathered)
So to detect if a page is facebook you could write a script that scans the html for [content="facebook"] or something similar

So time to test my theory. Go onto a proxy website and see if it is still there.

Test 1:
http://www.anonymouse.org

Yup, sucess:
content="Facebook helps you connect and share

Test 2:
google translate
Yup. Or, maybe:
=facebook
was the closest I could find. Still not likely to turn up in a normal website though. (links to facebook maybe though)

Test 3:
unblockthat.us
Yup
id="facebook
" Looking through here, and the ones previously tested had this snippet occurint in all of them. With the deviations being [id=facebook] or [id=facebook_logo]

So I kept testing for a while and came to the conclusions that most (if not all) proxy websites still have [id=facebook] or [content=facebook] in some shape or form.

If you are interested the proxy services I used for testing were from this page.


So if it is this simple to identify facebook. Why hasn't anyone written a blocking program that searches the html (specifically the header) of the page for a particular chunk of text (I imagine youtube and such would be similar, with a distinctive mark).
Is there something that I've missed that would make it impracticable?

(Oh, and this page has probably now been blocked by the school firewall for exceeding the "weighted phrase limit" to try and stop people googling ways around it)
Eh?
User avatar
fliptw
DBB DemiGod
DBB DemiGod
Posts: 6459
Joined: Sat Oct 24, 1998 2:01 am
Location: Calgary Alberta Canada

Re: Website Blocker idea. Needing reality check.....

Post by fliptw »

its called deep packet inspection.

With a properly configured and maintained network, its pretty easy to block sites without resorting to using DPI.
User avatar
Jeff250
DBB Master
DBB Master
Posts: 6539
Joined: Sun Sep 05, 1999 2:01 am
Location: ❄️❄️❄️

Re: Website Blocker idea. Needing reality check.....

Post by Jeff250 »

https (in general, encryption) defeats this.
User avatar
snoopy
DBB Benefactor
DBB Benefactor
Posts: 4435
Joined: Thu Sep 02, 1999 2:01 am

Re: Website Blocker idea. Needing reality check.....

Post by snoopy »

Jeff250 wrote:https (in general, encryption) defeats this.
x2

If I proxy through my RSA2 SSH connection, you have literally no way of knowing what I'm doing. I'm not that familiar with VPN, but you can probably tunnel over full encryption there, too. The only way that you can defeat that is to outright block the address. Port blocking can be defeated (I believe) if the VPN/SSH server is set to listen to a port that isn't blocked on your side. (Is that right, Jeff?)

I think that only really foolproof way to block traffic is to deny all but the members of a tightly controlled white list, but that's a pain for both the maintainer and the users.

At the same time, you are dealing with a school and your typical students aren't going to have the knowledge or resources to go to those lengths. The typical approach is to blacklist the desired sites along with known proxy servers/ways around it. Your approach would be good for for Google translate and other sites that you don't want to block, but you also don't want to be used as a workaround. Then, monitor traffic and when an known site starts getting a lot of hits, investigate and eliminate the workaround.

Summary: Whitelists mean no hacks around the system, but also mean that legitimate browsing is blocked. Blacklists/your approach mean people will get around the system, but with maintenance only the best and smartest will pull it off; popularity of an approach will attract your attention and subsequently get blocked.
Arch Linux x86-64, Openbox
"We'll just set a new course for that empty region over there, near that blackish, holeish thing. " Zapp Brannigan
User avatar
Jeff250
DBB Master
DBB Master
Posts: 6539
Joined: Sun Sep 05, 1999 2:01 am
Location: ❄️❄️❄️

Re: Website Blocker idea. Needing reality check.....

Post by Jeff250 »

Switching ports on your ssh server will defeat ssh port blocking, but someone could still block ssh on layer 7, although this is uncommon.
User avatar
sdfgeoff
DBB Ace
DBB Ace
Posts: 498
Joined: Wed Jan 12, 2011 1:07 am
Location: Low Earth Orbit
Contact:

Re: Website Blocker idea. Needing reality check.....

Post by sdfgeoff »

Gotcha, right. Thanks for the reality check.
Eh?
Post Reply