Website Blocker idea. Needing reality check.....

sdfgeoff · Post by **sdfgeoff** » Mon Sep 05, 2011 10:27 pm

Our School recently introduced an IP blocker to stop people going on youtube and facebook and so on.
Of course, by the end of the day there was a dozen ways around it: Google translate, proxy's etc.

So I was thinking of a way to write a script that there was no way to get around that stops people going on facebook. While this is probably impossible, I think I may have a way to stop 99% or more ways to get there:

Pretty much don't look at the URL, or the IP, but look at the html itself.

I was poking around, looking at the source of a facebook page to see if there was anything distinctive there and was pleasently surprised to find the following:

<meta property="og:site_name" content="Facebook" />

Looking at another dozen or so pages all turned up the same sort of thing.

Right, so every facebook page has that little string on line #3. (According to the data I have gathered)
So to detect if a page is facebook you could write a script that scans the html for [content="facebook"] or something similar

So time to test my theory. Go onto a proxy website and see if it is still there.

Test 1:
http://www.anonymouse.org

Yup, sucess:

content="Facebook helps you connect and share

Test 2:
google translate
Yup. Or, maybe:

=facebook

was the closest I could find. Still not likely to turn up in a normal website though. (links to facebook maybe though)

Test 3:
unblockthat.us
Yup

id="facebook

" Looking through here, and the ones previously tested had this snippet occurint in all of them. With the deviations being [id=facebook] or [id=facebook_logo]

So I kept testing for a while and came to the conclusions that most (if not all) proxy websites still have [id=facebook] or [content=facebook] in some shape or form.

If you are interested the proxy services I used for testing were from this page.

So if it is this simple to identify facebook. Why hasn't anyone written a blocking program that searches the html (specifically the header) of the page for a particular chunk of text (I imagine youtube and such would be similar, with a distinctive mark).
Is there something that I've missed that would make it impracticable?

(Oh, and this page has probably now been blocked by the school firewall for exceeding the "weighted phrase limit" to try and stop people googling ways around it)

fliptw · Post by **fliptw** » Tue Sep 06, 2011 1:27 am

its called deep packet inspection.

With a properly configured and maintained network, its pretty easy to block sites without resorting to using DPI.

Jeff250 · Post by **Jeff250** » Tue Sep 06, 2011 5:41 am

https (in general, encryption) defeats this.

snoopy · Post by **snoopy** » Tue Sep 06, 2011 9:02 am

Jeff250 wrote:https (in general, encryption) defeats this.

x2

If I proxy through my RSA2 SSH connection, you have literally no way of knowing what I'm doing. I'm not that familiar with VPN, but you can probably tunnel over full encryption there, too. The only way that you can defeat that is to outright block the address. Port blocking can be defeated (I believe) if the VPN/SSH server is set to listen to a port that isn't blocked on your side. (Is that right, Jeff?)

I think that only really foolproof way to block traffic is to deny all but the members of a tightly controlled white list, but that's a pain for both the maintainer and the users.

At the same time, you are dealing with a school and your typical students aren't going to have the knowledge or resources to go to those lengths. The typical approach is to blacklist the desired sites along with known proxy servers/ways around it. Your approach would be good for for Google translate and other sites that you don't want to block, but you also don't want to be used as a workaround. Then, monitor traffic and when an known site starts getting a lot of hits, investigate and eliminate the workaround.

Summary: Whitelists mean no hacks around the system, but also mean that legitimate browsing is blocked. Blacklists/your approach mean people will get around the system, but with maintenance only the best and smartest will pull it off; popularity of an approach will attract your attention and subsequently get blocked.

Jeff250 · Post by **Jeff250** » Tue Sep 06, 2011 11:35 pm

Switching ports on your ssh server will defeat ssh port blocking, but someone could still block ssh on layer 7, although this is uncommon.

sdfgeoff · Post by **sdfgeoff** » Wed Sep 07, 2011 2:01 am

Gotcha, right. Thanks for the reality check.

Website Blocker idea. Needing reality check.....

Website Blocker idea. Needing reality check.....

Re: Website Blocker idea. Needing reality check.....

Re: Website Blocker idea. Needing reality check.....

Re: Website Blocker idea. Needing reality check.....

Re: Website Blocker idea. Needing reality check.....

Re: Website Blocker idea. Needing reality check.....