|
|
|||||||
| Register | FAQ | Calendar | Radio and TV | NP Shop | Search | Today's Posts | Mark Forums Read |
| Traffic Forum Adult web master traffic trading, link requests, partner accounts and link submissions |
![]() |
![]() |
![]() |
![]() |
|
|
LinkBack | Thread Tools | Display Modes |
|
|
#1 (permalink) |
|
Freelance sys***** and programming guru
|
Better ways to block by user agent, better ways to stop ripping
In regards to the long list of rewrite conditions that
some people use listing hundreds of different rippers: One simple change in that approach will make it work a lot better and be MUCH more efficient. Keep in mind that if. for example, you have a page with 30 thumbnails that requires 30 requests to the server and the server has to process the .htaccess 30 times. If you have 200 rippers listed than loading those 30 thumbs means that the server then has to look at and compare the user_agent 30 X 200 times. That's 6,000 comparisons to be done to load just one page. This is the type of thing why we used to do just fine with 200 Mhz Pentium processors in our servers and now 2600 Mhz machines aren't fast enough at times - that's an enormous waste of resources. It also doesn't work very well since it only blocks the rippers that a) you know about and b) are too stupid to reset their user-agent. It also breaks security rule #1: Disallow everything, then allow only that which is needed. Take a look at your typical .htaccess for a paysite and you'll see that rule in action: deny from all require valid-user There's no need to try to list every possible user_agent who shouldn't be allowed to access your members area. That list could never possibly be kept current anyway. Instead just list the four or five browsers that SHOULD be allowed to access. MSIE, Mozilla (including Firefox and Netscape versions), Opera and Safari. If you have videos you'll also allow WMP, RealPlayer, Quicktime and maybe Xine and XMMS for your Linux customers . Without videos, that means that instead of 200 conditions you only have 5-7 and instead of doing 6,000 comparisons you're only doing 150. Well, Ray, what if some day a new browser comes along that a lot of people start using and I don't want to have to go back and update all of my .htaccess? Ok, fine. Still if the user_agent is IE we don't have to check 200 times to see if it's also a ripper. Once we know it's IE or Mozilla we can stop checking: RewriteCond ^Mozilla/[0-9]\.[0-9].*Gecko [OR] RewriteCond ^Mozilla/[0-9]\.[0-9] (compatible; MSIE [OR] RewriteRule /* - [L] Now put your 200 rules here, to be checked only if it's not IE or Mozilla. BTW, the Mozilla Gecko condition picks up Mozilla, Firefox, Netscape, and Safari because they all use the Gecko rendering engine. Better yet, instead of blockign based on User-Agent, which is only going to catch a few of the people, just block people who actually ARE ripping by using Throttlebox. __________________
Ray Morris support AT bettercgi.com Strongbox- The next generation in website security Throttlebox-The next generation in bandwidth control |
|
|
|
|
|
#2 (permalink) |
|
Im addicted to Netpond
|
raymor,
Excellent point. Always work smarter not harder. One more point to add when working with Rewrite Rules in your htaccess files. Always triple check that the page you are sending them to (if it is on your server) is there and accessible. If it isnt, then you run the risk of overloading the server by sending your htaccess function into a never ending loop. |
|
|
|
|
|
#4 (permalink) |
|
VP of blather and bullshit
Join Date: Sep 2006
Location: Montreal
Posts: 3,447
Points: 380
|
thanks for that useful tip raymor!!
__________________
Make $$$ with Method Cash - Hosted blogs, 12 niche specific sites, Tons of FHG's and much more!! |
|
|
|
|
|
#5 (permalink) |
|
Smile MotherFucker!
Join Date: Aug 2003
Location: Pit 26B, Hell
Posts: 2,361
Points: 6,735
|
I've rewritten my rules to include Ray's suggestion. I'm just testing and watching now to make sure I didn't make any stupid mistake that I haven't noticed yet. Once I know it's good I'll post the revised set of rules in the sticky thread.
|
|
|
|
|
|
#6 (permalink) |
|
Freelance sys***** and programming guru
|
Please do either post the rule set or email it, because I've misplaced my rules.
__________________
Ray Morris support AT bettercgi.com Strongbox- The next generation in website security Throttlebox-The next generation in bandwidth control |
|
|
|
|
|
#7 (permalink) | ||
|
Got Feet?
|
Quote:
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/[0-9]\.[0-9].*Gecko [OR] RewriteCond %{HTTP_USER_AGENT} ^Mozilla/[0-9]\.[0-9]\ \(compatible; MSIE [OR] RewriteRule .* - [S=1] Quote:
__________________
NEW! Shameless PimpsMy best earners: SapphicCash - Sabrina 'Rebill' Deep - NSCash - Adult Art Network My Thunder-Ball Profile - Learn about tribbing |
||
|
|
|
|
|
#9 (permalink) |
|
Blog Automation Software, Check My Sig!
|
Well, yes, thats a good idea, but take in mind, this might have a high impact on heavily trafficed sites, since now most the times it will do rewrite which slows server down, while the opposite way it was rewriting only when a fucker was coming to your site..
__________________
![]() #1 Mass Blogging Script: Blogs Organizer | #1 Mass RSS Feeder Script Blogs Automater #1 Multidomain Hardlink Trade Script : Links Organizer | #1 Blog Posts Builder Script: Gallery Scraper Complete List of Affiliate RSS Feeds! | A-B-C Blog Linktrades |
|
|
|
|
|
#10 (permalink) |
|
Smile MotherFucker!
Join Date: Aug 2003
Location: Pit 26B, Hell
Posts: 2,361
Points: 6,735
|
It will match on an extra rewrite rule, yes. But what it is matching, actually does nothing but trigger either a stop or a skip in the rewrite engine. It doesn't actually replace any URL's or redirect or anything.
I think in the end, that little bit of processing will be less than the thousands of comparisons made on every page load that Ray was talking about. I've been testing it for a few days on a server running 2 sites with about 90k a day in total traffic, each site with about 150 images on it. So that's a lot of requests hitting that htaccess file. (I'll eventually move it all to the apache config to improve it more) Just for fun, I worked that out: 90k times 151 requests. (150 images plus the html) 13,590,000 times per day the server processes the htaccess file. But so far, using Ray's suggestion, the load on the server is slightly lower. It didn't reduce the load as much as I was thinking or hoping it might, but it did seem to reduce it some. I just have to go through my stats and see if it didn't let any extra bot traffic through, and that it's actually working the way it's supposed to. |
|
|
|
|
|
#11 (permalink) |
|
Blog Automation Software, Check My Sig!
|
okay, thats cool
![]() just make sure you compile a good list of browsers to allow, including the mobile stuff which are getting used more and more these days.. __________________
![]() #1 Mass Blogging Script: Blogs Organizer | #1 Mass RSS Feeder Script Blogs Automater #1 Multidomain Hardlink Trade Script : Links Organizer | #1 Blog Posts Builder Script: Gallery Scraper Complete List of Affiliate RSS Feeds! | A-B-C Blog Linktrades |
|
|
|
|
|
#12 (permalink) |
|
Smile MotherFucker!
Join Date: Aug 2003
Location: Pit 26B, Hell
Posts: 2,361
Points: 6,735
|
Here's what I've got so far. I didn't update the "S" option yet. As it is now, it should probably be the last section of any .htaccess file, since it has [L] directives. Putting it before other commands will break them.
What it does is first check if the browser is Explorer or a Mozilla/Gecko derivative. If it is, it's allowed in, and the rewrite process stops. If it is not, then it is checked against the known bots and bad browsers. If it's a bad user agent, they get a forbidden error and rewriting stops. If they are an unknown browser (like cells and PDA's) they are allowed in. Code:
## Check For Mozilla/MSIE And Allow It Before Checking All
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/[0-9]\.[0-9].*Gecko [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/[0-9]\.[0-9]\ \(compatible\;\ MSIE
RewriteRule ^(.*) - [L]
## If Not Mozilla/MSIE check for bad bot
RewriteCond %{HTTP_USER_AGENT} Ants [NC,OR]
RewriteCond %{HTTP_USER_AGENT} attach [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Backweb [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Bandit [NC,OR]
RewriteCond %{HTTP_USER_AGENT} BlackWidow [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Bot.mailto.craftbot.yahoo.com [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Buddy [NC,OR]
RewriteCond %{HTTP_USER_AGENT} CherryPicker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ChinaClaw [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Collector [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Copier [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Crawler [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Crescent [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Custo [NC,OR]
RewriteCond %{HTTP_USER_AGENT} DA.4.0 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} DA.5.0 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} DA.5.3 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} DIIbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} DISCo [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Download.Demon [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Download.Ninja [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Downloader [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Drip [NC,OR]
RewriteCond %{HTTP_USER_AGENT} eCatch [NC,OR]
RewriteCond %{HTTP_USER_AGENT} EirGrabber [NC,OR]
RewriteCond %{HTTP_USER_AGENT} EmailCollector [NC,OR]
RewriteCond %{HTTP_USER_AGENT} EmailSiphon [NC,OR]
RewriteCond %{HTTP_USER_AGENT} EmailWolf [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Express.WebPictures [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ExtractorPro [NC,OR]
RewriteCond %{HTTP_USER_AGENT} EyeNetIE [NC,OR]
RewriteCond %{HTTP_USER_AGENT} FileHound [NC,OR]
RewriteCond %{HTTP_USER_AGENT} FlashGet [NC,OR]
RewriteCond %{HTTP_USER_AGENT} GetLeft [NC,OR]
RewriteCond %{HTTP_USER_AGENT} GetRight [NC,OR]
RewriteCond %{HTTP_USER_AGENT} GetWeb [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Gets [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Go.Zilla [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Go-Ahead-Got-It [NC,OR]
RewriteCond %{HTTP_USER_AGENT} gotit [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Grabber [NC,OR]
RewriteCond %{HTTP_USER_AGENT} GrabNet [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Grafula [NC,OR]
RewriteCond %{HTTP_USER_AGENT} HMView [NC,OR]
RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ia_archiver [NC,OR]
RewriteCond %{HTTP_USER_AGENT} IBrowse [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Image.Stripper [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Image.Sucker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Indy.Library [NC,OR]
RewriteCond %{HTTP_USER_AGENT} InterGET [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Internet.Ninja [NC,OR]
RewriteCond %{HTTP_USER_AGENT} JetCar [NC,OR]
RewriteCond %{HTTP_USER_AGENT} JOC [NC,OR]
RewriteCond %{HTTP_USER_AGENT} JustView [NC,OR]
RewriteCond %{HTTP_USER_AGENT} larbin [NC,OR]
RewriteCond %{HTTP_USER_AGENT} leech [NC,OR]
RewriteCond %{HTTP_USER_AGENT} LeechFTP [NC,OR]
RewriteCond %{HTTP_USER_AGENT} lftp [NC,OR]
RewriteCond %{HTTP_USER_AGENT} likse [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Lickity [NC,OR]
RewriteCond %{HTTP_USER_AGENT} LinkWalker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Link.Sleuth [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Magnet [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Mag-Net [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Mass.Downloader [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Memo [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Microsoft.URL [NC,OR]
RewriteCond %{HTTP_USER_AGENT} MIDown.tool [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Mirror [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Mister.Pix [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Navroad [NC,OR]
RewriteCond %{HTTP_USER_AGENT} NearSite [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Net.Vampire [NC,OR]
RewriteCond %{HTTP_USER_AGENT} NetAnts [NC,OR]
RewriteCond %{HTTP_USER_AGENT} NetCaptor [NC,OR]
RewriteCond %{HTTP_USER_AGENT} NetSpider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} NetZIP [NC,OR]
RewriteCond %{HTTP_USER_AGENT} NICErsPRO [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Octopus [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Offline [NC,OR]
RewriteCond %{HTTP_USER_AGENT} PageGrabber [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Papa.Foto [NC,OR]
RewriteCond %{HTTP_USER_AGENT} pcBrowser [NC,OR]
RewriteCond %{HTTP_USER_AGENT} PicaLoader [NC,OR]
RewriteCond %{HTTP_USER_AGENT} PictureRipper [NC,OR]
RewriteCond %{HTTP_USER_AGENT} PicHunter [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Pockey [NC,OR]
RewriteCond %{HTTP_USER_AGENT} psbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Pump [NC,OR]
RewriteCond %{HTTP_USER_AGENT} RealDownload [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Reaper [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Recorder [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ReGet [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Siphon [NC,OR]
RewriteCond %{HTTP_USER_AGENT} sitecheck.internetseer.com [NC,OR]
RewriteCond %{HTTP_USER_AGENT} SiteSnagger [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Slurp [NC,OR]
RewriteCond %{HTTP_USER_AGENT} SmartDownload [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Snagger [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Snake [NC,OR]
RewriteCond %{HTTP_USER_AGENT} SpaceBison [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Stripper [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Sucker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} SuperBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} SuperHTTP [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Surfbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} tAkeOut [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Teleport [NC,OR]
RewriteCond %{HTTP_USER_AGENT} IUPUI [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Vacuum [NC,OR]
RewriteCond %{HTTP_USER_AGENT} VoidEYE [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Web.Bandit [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Web.Image.Collector [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Web.Sucker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebAuto [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebCopier [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Web.Downloader [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebEMailExtrac [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebGo.IS [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebFetch [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebLeacher [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebReaper [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebRipper [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebSauger [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Website.eXtractor [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Website.Quester [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Webster [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebStripper [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebWhacker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebZIP [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Whacker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Widow [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WWWOFFLE [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Xaldon.WebSpider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Zeus [NC]
RewriteRule .*$ - [F,L]
![]() Edit: I made a typo in the original post and left an [OR] statement after the check for MSIE which caused things to not work right. The above code is now correct and seems to work through testing with wannabrowser. Last edited by Fuckin Bill : 04-01-2007 at 04:09 PM. |
|
|
|
|
|
#13 (permalink) |
|
Freelance sys***** and programming guru
|
Actually, Safari (and Konqueror in Linux) use the KHTML engine, but they put the word 'Gecko' in the user-agent string just remain compatible in cases like this.[/quote]
That's true, WebCore, the Safari rendering engine, WAS orginally forked from KHTML, not Gecko. Thanks for the reminder. hopefulyl though Unity the two engines, WebCore and KHTML, can be brought back together so that everyone benefits from new features and bug fixes. More details for the curious: http://en.wikipedia.org/wiki/KHTML __________________
Ray Morris support AT bettercgi.com Strongbox- The next generation in website security Throttlebox-The next generation in bandwidth control |
|
|
|
![]() |
| Thread Tools | |
| Display Modes | |
|
|
![]() |
![]() |