Belajar .htaccess
Segala tentang .htaccess ada disini.
spt yang dibilang bahwa I’m not a programmer or server-side specialist. jd mungkin perlu sedikit mengingat-ingat mengenai seluk beluk .htaccess ini.
First, A Word of Warning
Keep in mind that one little typo or incorrect rule within an .htaccess
file can cause an internal server error and take your entire website offline.
Especially if you’re new to using an .htaccess file, I highly recommend setting
up a test directory to work on your .htaccess file. In addition, always make a
backup of your .htaccess file before making any changes. That way, if you do
happen to make a typo or other error, you can load your backup file again to
keep your website up and running while you look for the source of the problem(s).
In addition, many caution those new to .htaccess about not getting too
carried away and ending up creating excessively big .htaccess files. Keep in
mind that the server will process this file for each request at your website, so
you don’t want to negatively impact your server’s performance. For those with
access to the
httpd.conf file on your Apache server, many recommend using that instead of
.htaccess, especially for better server performance. Many of us on shared
servers, though, don’t have access to it, including myself.
I prefer to think of .htaccess as just one of a variety of approaches and
tools for managing URLs (especially URL redirecting), managing custom error
pages, and combating bad bots and spammers. It’s a fantastic tool that I’m
thrilled to be able to use for my own websites finally, including this one.
Regarding combating bad bots and spammers, .htaccess is one of several
tools and approaches that I use. My goal is to keep things simple and block the
bad guys without blocking everyone else. No one single approach can do it all,
though, and bad bots and spammers continually work on ways to get past all the
blocking approaches discussed online. So far I’m able to block nearly all
of the bad bots and spammers, but new ones always come along, so I watch my logs
closely, too.
On to some website links that I’ve found especially helpful.
Apache Documentation
First, here are several links to the definitive source for
Apache 1.3 and Apache 2.0 specifically related to using .htaccess, especially
for redirecting URLs and blocking bad bots and spammers.
Apache 1.3
- Apache 1.3:
Authentication, Authorization, and Access Control - Apache 1.3:
Module mod_access - Apache
1.3: Module mod_rewrite URL Rewriting Engine - Apache
1.3: Module mod_setenvif - Apache 1.3: Modules
- Apache 1.3
Tutorial: .htaccess files - Apache
1.3 Tutorial: Apache URL Rewriting Guide By Ralf S. Engelschall
Apache 2.0
- Apache 2.0:
Authentication, Authorization, and Access Control - Apache 2.0:
Apache Module mod_access - Apache
2.0: Apache Module mod_rewrite - Apache
2.0: Apache Module mod_setenvif - Apache 2.0: Module Index
- Apache 2.0
Tutorial: .htaccess files - Apache
2.0 Tutorial: Apache URL Rewriting Guide By Ralf S. Engelschall
How to Use .htaccess, mod_rewrite, and Related (for Apache)
- .htaccess tips and tricks
and more .htaccess tips
and tricks: redirecting and rewriting Via corz.org. - .htaccess
Tutorial
Introduction to .htaccess, including what you can do with .htaccess, creating
custom error pages, deny/allow access to specific pages or directories,
password protection, redirecting URLs, and more. By David Gowans, via
freewebmasterhelp.com.
Abbreviate URLs with mod_rewrite
By Andy King, via websiteoptimization.com.- An Introduction to
Redirecting URLs on an Apache Server
For mod_rewrite beginners, by DaveAtIFG via Webmasterworld, Dec 16, 2002. - mod_rewrite:
A Beginner’s Guide to URL Rewriting
By Tamas Turcsanyi, via SitePoint, October 22, 2002. - Rewriting
URLs with mod_rewrite
By Daniel via 4webhelp.net, updated February 09, 2004. - URLS! URLS! URLS!
by Bill Humphries via A List Apart, June 30, 2000. - Using the .htaccess
File
Helpful, easy-to-understand introduction to .htaccess and what you can do with
it. Collated by Miraz Jordan via wise-women.org.
.htaccess Tools
I’ve been scouring the Internet looking for tools that will
check .htaccess files for typos or other potential problems. So far I haven’t
found anything, although I did find some tools that will help you create .htacess
rules and test user agent strings. They’re listed below.
Tools to Generate .htaccess Rules
Try one of these tools to generate redirects, hotlink
protection, password protection, or blocking bad bots. At the minimum, you can
try them out as learning tools to see how something might be handled. Note that
they might not do very complex rules.
- .htaccess Tools
This website has tools to generate hotlink protection, password protection, or
blocking of hitbots for your .htaccess files. -
mod_rewrite RewriteRule Generator
A mod_rewrite rule generator tool via Webmaster Toolkit.
Tools to Test .htaccess Rules
- WannaBrowser
Wanna Browser is a helpful test to see if your rewrite rules are working as
you wish for user agents. If you’ve blocked a certain user agent string for a
bad bot, for example, you can see if your rule is working properly with their
online tool.
Forums Devoted to Apache .htaccess, mod_rewrite, mod_setenvif, and Related
You’ll find enormously helpful tips and troubleshooting help
for using .htaccess, mod_rewrite, mod_setenvif, and related Apache features via
these forums. You don’t need to subscribe to read most discussions, although
you’ll need to sign up to post your questions or comments, and Webmasterworld
Forums has subscriber-only areas in addition to their freely available areas.
- .htaccess Tools Forum
Helpful forum devoted to all things .htaccess via the .htaccess Tools website. - mod-rewrite.com
Website has a forum all about using mod_rewrite, such as URL handling, access
restriction, regular expressions, and more. - SitePoint
Forums: Apache
SitePoint’s Apache forum is a busy one, filled with lots of tips, examples,
resources, and more.
- Webmasterworld Forums:
Apache Web Server
This top-notch forum covers .htaccess, mod_rewrite, and other Apache topics.
I’ve found countless tips and insight here. Be sure to check out their
Charter – Apache Web Server, too, as you’ll find helpful resources there
in addition to reviewing their rules for posting.
Using .htaccess to Block Hotlinking, Stop Bandwidth Theft
I absolutely love the availability of preventing other websites
from directly linking to my server’s images, CSS, JavaScript, etc. using .htaccess.
Here are a couple of tutorials on how to do it.
-
Preventing image hotlinking: An improved tutorial
By Tom Sherman, via Underscorebleach.net, November 21, 2004 (updated September
14, 2005.) - Smarter Image
Hotlinking Prevention
Prevent others from directly linking to your server’s images, CSS, JavaScript,
and other files. By Thomas Scott, via A List Apart, July 13, 2004. - Stop Hotlinking and
Bandwidth Theft with .htaccess
Helpful, easy-to-understand tutorial at altlab.com. The approach used in this
tutorial is basically what I do for my websites, as I’ve also chosen to send a
Forbidden (403) error message. - URL
Hotlink Checker
You can test the effectiveness of your website’s hotlink protection with this
online tool by entering a complete URL from your website to see if your image
can be loaded and hotlinked by a remote server. Via altlab.com.
Note that you might wish to allow certain sites to directly link to a specific
image, such as an icon image for your newsfeeds, while still not allowing hotlinking
to all your other images. I recently added my newsfeeds-related icon image to
a separate directory, and in that directory’s .htaccess file I’ve specified
a rule using Apache’s
directive to allow hotlinking to that specific image only. I’m currently
testing that to see how it goes for the next few weeks. I prefer that people
download the icon to use from their own servers, so if I find other websites
abusing the hotlinking for that image, it’s easy enough to individually prevent
them from hotlinking to it and make more restrictive rules within that separate
directory’s .htaccess file.
Using .htaccess to Ban Bad Bots and Spammers
Note that some of the Webmasterworld forum links might require
a subscription.
- A close to perfect
.htaccess ban list – part 1,
A close to perfect .htaccess
ban list – part 2, and
A close to perfect .htaccess
ban list – part 3
Fabulous thread at Webmasterworld about using .htaccess to help block all the
bad bats, spammers, and other bad guys. -
How to block spambots, ban spybots, and tell unwanted robots to go to hell
By Mark Pilgrim, via Dive into Mark, February 26, 2003. - I Love Jack Daniels (weblog):
Apache category
So far the section includes tutorials to block referrer spam, ignore
directories in mod_rewrite, mod_rewrite cheat sheet, password protect a
directory with .htaccess, HTTP status codes explained, .htaccess error
documents. By Dave Child, ilovejackdaniels.com. -
Killing referrer spam
By Dorothea, via Caveat Lector, January 11, 2005 -
Tips and Examples for how to use your .htaccess file
.htaccess file explained with examples for eliminating referrer spam and deep
linking, by Mr. Steve, shooter.net.
Some helpful forum threads:
- Apache Web
Server: A Close to perfect .htaccess ban list – Part 3: More tips and tricks
for banning those pesky “problem bots!â€
Via Webmasterworld, April, 2004. - List of bad bot/spiders:
Requesting a list of known bad bots
Via Webmasterworld, September 7, 2005. -
Perl Server Side CGI Scripting: A Close to perfect .htaccess ban list (further
discussion)
Via Webmasterworld, October 23, 2001. - Search
Engine Spider Identification: Webmasterworld forum 11: Updated and Collated
Bot List
Updated UA strings with links to discussions about them. Look for the
newest/latest (currently a 3-page thread). - Tracking and
Logging: New referrer spammer
Via Webmasterworld, May 17, 2005.
Weblogs, Wikis, Sites, Sections Devoted to Combating Bad Bots, Spammers
- Chongqed
Manni’s weblog (Manfred Heumann) devoted to hunting down and sharing spammer
information, wiki spam, email spam, and life in general. - chongqed.org
Another invaluable weblog and wiki devoted to hunting down spammers and
sharing info with everyone to fight wiki spam, blog spam, and guestbook spam.
Run by Joe (from Texas) and Manni (Manfred Heumann). - Spam Chongqing
Joe’s (from Texas) weblog devoted to hunting down and sharing spammer
information. - Spam Huntress
An invaluable weblog and wiki devoted to hunting down spammers, sharing info
with everyone to help combat spam and block spammers from your websites. - Spam Kings Blog
News and information about catching, prosecuting spammers covering topics from
the book, by Brian McWilliams. - Tom Raftery’s I.T.
Views: .htaccess Category
Tom’s site is also quite helpful with strategies, tips, and links to combat
spammers and bat bots.
Thoughts on Dealing with Comment, Referral, Trackback Spam
As I mentioned above, no one approach will be totally effective
or even practical in blocking comment spam, referral spam, or trackback spam.
Blocking by IP address or host can quickly become impractical, as anyone knows
who’s tried to block solely by IP address. Your ban list will grow rapidly, IPs
get outdated just as fast, and IPs often
come from zombie machines. Blocking by user agent can help, but spammers
spoof user agents and you don’t want to block legitimate users. There are known
spoofed user agent strings that you can add to your ban list, though, which can
help quite a bit. Blocking by referrer can be helpful, but once again your ban
list will grow quickly, too, similar to IP lists. Blocking by keywords for
referrers and hosts can help cover most spam referrals and hosts, but I’ve also
recently found spammers trying more legitimate-looking domain names. Keep in
mind that spammers are always coming up with new ways to get around blocking
approaches, too.
Largely for these reasons I’ve found it most effective for my own websites to
use a combination of several approaches and tools. Each of my websites is
different, though, so I don’t do the same things at each site, although there is
certainly some overlap.
Here are some helpful articles on ideas and ways of helping to combat the
spammers.
Concerning Spam
Great post about a variety of ways to prevent spammers. By Elise Bauer via
elise.com. Updated August 29, 2005. Originally posted in 2004.-
Proposal on referrer spam: Background and blacklists
By Tom Sherman, January 15, 2005.
Save Your Site from Spambots: Techniques to Prevent Address Scraping
Excellent article by Steven Champeon via hesketh.com (reprinted from
WebTechniques).
Solving comment spam
A must-read article by Simon Willison via Simon Willison’s Weblog, January 28,
2004.
Regular Expressions
Learning even just a little about regular expressions can be
valuably helpful. Learning more about regular expressions can go a long way with
writing leaner mod_rewrite rules and other rules for your .htaccess files.
- A Tao of Regular
Expressions
A good overview of some basics of regular expressions. By Steve Mansour via
sitescooper.org. - Basics of
Regular Expressions
Part 1 of a 2-part tutorial on mod_rewrite and regular expressions . By
Justin, “jd01†via Webmasterworld, May 3, 2005. - Mod_Rewrite and
Regular Expressions
Part 2 of a 2-part tutorial on mod_rewrite and regular expressions. By Justin,
“jd01†via Webmasterworld, Aug 11, 2005. -
mod_rewrite Cheat Sheet
By Dave Child, ilovejackdaniels.com. - Perl 5 Regular
Expressions
Part of “Rex Swain’s HTMLified Perl 5 Reference Guide.†Helpful
basics here. - Perl Regular
Expressions
Official documentation via the perl.com website.
Regular Expressions
By Chris Karakas, Claudio Erba via karakas-online.de.- Regular Expressions
Testing Tool
Via RegExLib.com. -
Using Regular Expressions
A helpful introductory tutorial by Stephen Ramsay, via the Electronic Text
Center, University of Virginia. -
Do you think this RewriteCond would be too rude?
A helpful thread on regular expressions within an .htaccess file.
Robots.txt
Unfortunately, many bots disregard or don’t even look at your
robots.txt file. Good ones will, though, and it’s worth creating, even if the
bad bots ignore or don’t even look at it.
For my own websites, as long as the bot or spider behaves itself properly, I
typically allow it, but I do have exclusions in my robots.txt file. Known bad
bots or spiders and bots or spiders that disregard the rules or behave badly are
banned from my website via my .htaccess file.
Here’s some information on how to create and check a robots.txt file for your
website.
- The Web Robots Pages
Martijn Koster’s website all about robots.txt and the Robots Exclusion
standard. - Put your robots.txt
on a diet
How to reduce the file size of your robots.txt file by removing duplications,
compressing multiple records, and more. Via Webmasterworld. - The
Robots.txt Our Big Crawl
Common problems and errors found after researching 2.4 million URLs and 75,000
Robots.txt files. Great insight so you avoid these problems! Via
Webmasterworld.
Robots.txt Validator
Check your robots.txt here with this helpful online tool. Via
SearchEngineWorld.
Which Bots or User Agents are Good or Bad?
- Bots, Blogs and News Aggregators
Presentation Sources and White Paper
By Marcus P. Zillman, M.S., A.M.H.A. - Information Retrieval Software
A website devoted to providing information about information retrieval
software (including email scrapers, spambots, etc.), search engine robots, and
more. - List of Bad Bots
Helpful information here on quite a few user agents, including what type of
bot, user agent strings, IP addresses, links to more details, and more. Well
done. By Ralf D. Kloth, via kloth.net. - List of User-Agents
(Spiders, Robots, Crawler, Browser)
Hundreds listed in these helpful charts that include type of user agent,
descriptions and links to information about hundreds of spiders, robots,
crawlers, and browsers. Types include: (Client) browser, Link-, bookmark-,
server- checking; Downloading tool; Proxy server, web filtering; Robot,
crawler, spider; Spam or bad bot. By Andreas Staeding, via psychedlix.com. - Project
Honey Pot Statistics: Top Spam Harvester User Agents
Listings by type of user agent, including the page linked here. You’ll also
find Robot User Agents, currently active Top 25 Global Spam Harvester List,
and more. Via projecthoneypot.org. - RSS user
agent identifiers
A helpful list of RSS user agents categorized by Web aggregators and search
engines, Desktop readers and aggregators, RSS tools and services. By Philip
Shaw, via Code Style. - Search
Engine Spider Identification: Ultimate short list of banned bots
Includes helpful links for ways to fend off bad bots and spammers. Via
Webmasterworld. - Search Engine
Robots
Fabulous listings here with descriptions and links. The categorizes include:
Search engine robots and others, Browsers, Link Checkers, Link monitors and
bookmark managers, Validators, FTP clients and download managers, Research
projects, Software packages, Offline browsers and other agents, Other
miscellaneous agents, Sites that regularly visit, Other useful sites, some
fakers. By John A. Fotheringham, via jafsoft.com.
Statistics
An informative and helpful post about stats logs (primarily AWStats) handling
xml feeds and how to sort them out to get a better view of good guys and bad
guys. By Tomas Jogin, via Jogin.com, June 15, 2004.- System: User Agents
A searchable directory of user agent strings that includes their source,
purpose, links to more information for most of them, and you can search their
database or paste a user agent string into a form there, too. Fantastic and
helpful features. Provided by The Art of Web. - User Agent Strings
Helpful table of user agents with descriptions, and includes opinion of
whether they’re legitimate or not, good or bad, etc., and has links to more
info. Via 50by50.com.
HTTP Error Codes
Most of us probably know what a 404 error is (page not found),
but there are lots more server-side error codes. You can create custom error
pages with more helpful error messages, adding rules for them within your .htaccess
files if you wish, such as a custom 404 message. You can view
this website’s custom 404
error message to see what I mean. Here are some helpful sources for more
information about error codes.
- W3C: Hypertext
Transfer Protocol—HTTP/1.1
The definitive source of HTTP error codes via W3C. - HTTP Error Codes
and what they mean
Well done, thorough, and helpful list and easy-to-understand descriptions of
HTTP error codes. Via wats.ca. - Apache
HTTP Status Codes and the Apache Redirect Directive
Article by Gez Lemon via JuicyStudio, May 25, 2005.
Server Vulnerabilities
Chapter 2. Security For Administrators
Interesting article about how to track down who’s attacking your server and
what to do about it.












This post has 18 comments
May 16th, 2006
What is spam?
Reply
May 16th, 2006
KARTHAGO DELENDA EST SEMPER VBI SVB VBI SIC TRANSIT GLORIA MVNDI QVID ME VEXARI
Reply
May 16th, 2006
I Made Yanuarta DPY
My Place to start the World..!!!
——————————————————————————–
« FreeBSD Easy Installation GeneratorMencari tahu yang ngopy isi content dari Website Kita »Belajar .htaccess
Posted 1/16/2006 4:55 pm in Another Stuff, nGeWeb by Me My Self
Segala tentang .htaccess ada disini.
spt yang dibilang bahwa I’m not a programmer or server-side specialist. jd mungkin perlu sedikit mengingat-ingat mengenai seluk beluk .htaccess ini.
First, A Word of Warning
Keep in mind that one little typo or incorrect rule within an .htaccess file can cause an internal server error and take your entire website offline. Especially if you’re new to using an .htaccess file, I highly recommend setting up a test directory to work on your .htaccess file. In addition, always make a backup of your .htaccess file before making any changes. That way, if you do happen to make a typo or other error, you can load your backup file again to keep your website up and running while you look for the source of the problem(s).
In addition, many caution those new to .htaccess about not getting too carried away and ending up creating excessively big .htaccess files. Keep in mind that the server will process this file for each request at your website, so you don’t want to negatively impact your server’s performance. For those with access to the httpd.conf file on your Apache server, many recommend using that instead of .htaccess, especially for better server performance. Many of us on shared servers, though, don’t have access to it, including myself.
I prefer to think of .htaccess as just one of a variety of approaches and tools for managing URLs (especially URL redirecting), managing custom error pages, and combating bad bots and spammers. It’s a fantastic tool that I’m thrilled to be able to use for my own websites finally, including this one.
Regarding combating bad bots and spammers, .htaccess is one of several tools and approaches that I use. My goal is to keep things simple and block the bad guys without blocking everyone else. No one single approach can do it all, though, and bad bots and spammers continually work on ways to get past all the blocking approaches discussed online. So far I’m able to block nearly all of the bad bots and spammers, but new ones always come along, so I watch my logs closely, too.
On to some website links that I’ve found especially helpful.
Apache Documentation
First, here are several links to the definitive source for Apache 1.3 and Apache 2.0 specifically related to using .htaccess, especially for redirecting URLs and blocking bad bots and spammers.
Apache 1.3
Apache 1.3: Authentication, Authorization, and Access Control
Apache 1.3: Module mod_access
Apache 1.3: Module mod_rewrite URL Rewriting Engine
Apache 1.3: Module mod_setenvif
Apache 1.3: Modules
Apache 1.3 Tutorial: .htaccess files
Apache 1.3 Tutorial: Apache URL Rewriting Guide
By Ralf S. Engelschall
Apache 2.0
Apache 2.0: Authentication, Authorization, and Access Control
Apache 2.0: Apache Module mod_access
Apache 2.0: Apache Module mod_rewrite
Apache 2.0: Apache Module mod_setenvif
Apache 2.0: Module Index
Apache 2.0 Tutorial: .htaccess files
Apache 2.0 Tutorial: Apache URL Rewriting Guide
By Ralf S. Engelschall
How to Use .htaccess, mod_rewrite, and Related (for Apache)
.htaccess tips and tricks and more .htaccess tips and tricks: redirecting and rewriting
Via corz.org.
.htaccess Tutorial
Introduction to .htaccess, including what you can do with .htaccess, creating custom error pages, deny/allow access to specific pages or directories, password protection, redirecting URLs, and more. By David Gowans, via freewebmasterhelp.com.
Abbreviate URLs with mod_rewrite
By Andy King, via websiteoptimization.com.
An Introduction to Redirecting URLs on an Apache Server
For mod_rewrite beginners, by DaveAtIFG via Webmasterworld, Dec 16, 2002.
mod_rewrite: A Beginner’s Guide to URL Rewriting
By Tamas Turcsanyi, via SitePoint, October 22, 2002.
Rewriting URLs with mod_rewrite
By Daniel via 4webhelp.net, updated February 09, 2004.
URLS! URLS! URLS!
by Bill Humphries via A List Apart, June 30, 2000.
Using the .htaccess File
Helpful, easy-to-understand introduction to .htaccess and what you can do with it. Collated by Miraz Jordan via wise-women.org.
.htaccess Tools
I’ve been scouring the Internet looking for tools that will check .htaccess files for typos or other potential problems. So far I haven’t found anything, although I did find some tools that will help you create .htacess rules and test user agent strings. They’re listed below.
Tools to Generate .htaccess Rules
Try one of these tools to generate redirects, hotlink protection, password protection, or blocking bad bots. At the minimum, you can try them out as learning tools to see how something might be handled. Note that they might not do very complex rules.
.htaccess Tools
This website has tools to generate hotlink protection, password protection, or blocking of hitbots for your .htaccess files.
mod_rewrite RewriteRule Generator
A mod_rewrite rule generator tool via Webmaster Toolkit.
Tools to Test .htaccess Rules
WannaBrowser
Wanna Browser is a helpful test to see if your rewrite rules are working as you wish for user agents. If you’ve blocked a certain user agent string for a bad bot, for example, you can see if your rule is working properly with their online tool.
Forums Devoted to Apache .htaccess, mod_rewrite, mod_setenvif, and Related
You’ll find enormously helpful tips and troubleshooting help for using .htaccess, mod_rewrite, mod_setenvif, and related Apache features via these forums. You don’t need to subscribe to read most discussions, although you’ll need to sign up to post your questions or comments, and Webmasterworld Forums has subscriber-only areas in addition to their freely available areas.
.htaccess Tools Forum
Helpful forum devoted to all things .htaccess via the .htaccess Tools website.
mod-rewrite.com
Website has a forum all about using mod_rewrite, such as URL handling, access restriction, regular expressions, and more.
SitePoint Forums: Apache
SitePoint’s Apache forum is a busy one, filled with lots of tips, examples, resources, and more.
Webmasterworld Forums: Apache Web Server
This top-notch forum covers .htaccess, mod_rewrite, and other Apache topics. I’ve found countless tips and insight here. Be sure to check out their Charter – Apache Web Server, too, as you’ll find helpful resources there in addition to reviewing their rules for posting.
Using .htaccess to Block Hotlinking, Stop Bandwidth Theft
I absolutely love the availability of preventing other websites from directly linking to my server’s images, CSS, JavaScript, etc. using .htaccess. Here are a couple of tutorials on how to do it.
Preventing image hotlinking: An improved tutorial
By Tom Sherman, via Underscorebleach.net, November 21, 2004 (updated September 14, 2005.)
Smarter Image Hotlinking Prevention
Prevent others from directly linking to your server’s images, CSS, JavaScript, and other files. By Thomas Scott, via A List Apart, July 13, 2004.
Stop Hotlinking and Bandwidth Theft with .htaccess
Helpful, easy-to-understand tutorial at altlab.com. The approach used in this tutorial is basically what I do for my websites, as I’ve also chosen to send a Forbidden (403) error message.
URL Hotlink Checker
You can test the effectiveness of your website’s hotlink protection with this online tool by entering a complete URL from your website to see if your image can be loaded and hotlinked by a remote server. Via altlab.com.
Note that you might wish to allow certain sites to directly link to a specific image, such as an icon image for your newsfeeds, while still not allowing hotlinking to all your other images. I recently added my newsfeeds-related icon image to a separate directory, and in that directory’s .htaccess file I’ve specified a rule using Apache’s directive to allow hotlinking to that specific image only. I’m currently testing that to see how it goes for the next few weeks. I prefer that people download the icon to use from their own servers, so if I find other websites abusing the hotlinking for that image, it’s easy enough to individually prevent them from hotlinking to it and make more restrictive rules within that separate directory’s .htaccess file.
Using .htaccess to Ban Bad Bots and Spammers
Note that some of the Webmasterworld forum links might require a subscription.
A close to perfect .htaccess ban list – part 1, A close to perfect .htaccess ban list – part 2, and A close to perfect .htaccess ban list – part 3
Fabulous thread at Webmasterworld about using .htaccess to help block all the bad bats, spammers, and other bad guys.
How to block spambots, ban spybots, and tell unwanted robots to go to hell
By Mark Pilgrim, via Dive into Mark, February 26, 2003.
I Love Jack Daniels (weblog): Apache category
So far the section includes tutorials to block referrer spam, ignore directories in mod_rewrite, mod_rewrite cheat sheet, password protect a directory with .htaccess, HTTP status codes explained, .htaccess error documents. By Dave Child, ilovejackdaniels.com.
Killing referrer spam
By Dorothea, via Caveat Lector, January 11, 2005
Tips and Examples for how to use your .htaccess file
.htaccess file explained with examples for eliminating referrer spam and deep linking, by Mr. Steve, shooter.net.
Some helpful forum threads:
Apache Web Server: A Close to perfect .htaccess ban list – Part 3: More tips and tricks for banning those pesky “problem bots!â€
Via Webmasterworld, April, 2004.
List of bad bot/spiders: Requesting a list of known bad bots
Via Webmasterworld, September 7, 2005.
Perl Server Side CGI Scripting: A Close to perfect .htaccess ban list (further discussion)
Via Webmasterworld, October 23, 2001.
Search Engine Spider Identification: Webmasterworld forum 11: Updated and Collated Bot List
Updated UA strings with links to discussions about them. Look for the newest/latest (currently a 3-page thread).
Tracking and Logging: New referrer spammer
Via Webmasterworld, May 17, 2005.
Weblogs, Wikis, Sites, Sections Devoted to Combating Bad Bots, Spammers
Chongqed
Manni’s weblog (Manfred Heumann) devoted to hunting down and sharing spammer information, wiki spam, email spam, and life in general.
chongqed.org
Another invaluable weblog and wiki devoted to hunting down spammers and sharing info with everyone to fight wiki spam, blog spam, and guestbook spam. Run by Joe (from Texas) and Manni (Manfred Heumann).
Spam Chongqing
Joe’s (from Texas) weblog devoted to hunting down and sharing spammer information.
Spam Huntress
An invaluable weblog and wiki devoted to hunting down spammers, sharing info with everyone to help combat spam and block spammers from your websites.
Spam Kings Blog
News and information about catching, prosecuting spammers covering topics from the book, by Brian McWilliams.
Tom Raftery’s I.T. Views: .htaccess Category
Tom’s site is also quite helpful with strategies, tips, and links to combat spammers and bat bots.
Thoughts on Dealing with Comment, Referral, Trackback Spam
As I mentioned above, no one approach will be totally effective or even practical in blocking comment spam, referral spam, or trackback spam. Blocking by IP address or host can quickly become impractical, as anyone knows who’s tried to block solely by IP address. Your ban list will grow rapidly, IPs get outdated just as fast, and IPs often come from zombie machines. Blocking by user agent can help, but spammers spoof user agents and you don’t want to block legitimate users. There are known spoofed user agent strings that you can add to your ban list, though, which can help quite a bit. Blocking by referrer can be helpful, but once again your ban list will grow quickly, too, similar to IP lists. Blocking by keywords for referrers and hosts can help cover most spam referrals and hosts, but I’ve also recently found spammers trying more legitimate-looking domain names. Keep in mind that spammers are always coming up with new ways to get around blocking approaches, too.
Largely for these reasons I’ve found it most effective for my own websites to use a combination of several approaches and tools. Each of my websites is different, though, so I don’t do the same things at each site, although there is certainly some overlap.
Here are some helpful articles on ideas and ways of helping to combat the spammers.
Concerning Spam
Great post about a variety of ways to prevent spammers. By Elise Bauer via elise.com. Updated August 29, 2005. Originally posted in 2004.
Proposal on referrer spam: Background and blacklists
By Tom Sherman, January 15, 2005.
Save Your Site from Spambots: Techniques to Prevent Address Scraping
Excellent article by Steven Champeon via hesketh.com (reprinted from WebTechniques).
Solving comment spam
A must-read article by Simon Willison via Simon Willison’s Weblog, January 28, 2004.
Regular Expressions
Learning even just a little about regular expressions can be valuably helpful. Learning more about regular expressions can go a long way with writing leaner mod_rewrite rules and other rules for your .htaccess files.
A Tao of Regular Expressions
A good overview of some basics of regular expressions. By Steve Mansour via sitescooper.org.
Basics of Regular Expressions
Part 1 of a 2-part tutorial on mod_rewrite and regular expressions . By Justin, “jd01†via Webmasterworld, May 3, 2005.
Mod_Rewrite and Regular Expressions
Part 2 of a 2-part tutorial on mod_rewrite and regular expressions. By Justin, “jd01†via Webmasterworld, Aug 11, 2005.
mod_rewrite Cheat Sheet
By Dave Child, ilovejackdaniels.com.
Perl 5 Regular Expressions
Part of “Rex Swain’s HTMLified Perl 5 Reference Guide.†Helpful basics here.
Perl Regular Expressions
Official documentation via the perl.com website.
Regular Expressions
By Chris Karakas, Claudio Erba via karakas-online.de.
Regular Expressions Testing Tool
Via RegExLib.com.
Using Regular Expressions
A helpful introductory tutorial by Stephen Ramsay, via the Electronic Text Center, University of Virginia.
Do you think this RewriteCond would be too rude?
A helpful thread on regular expressions within an .htaccess file.
Robots.txt
Unfortunately, many bots disregard or don’t even look at your robots.txt file. Good ones will, though, and it’s worth creating, even if the bad bots ignore or don’t even look at it.
For my own websites, as long as the bot or spider behaves itself properly, I typically allow it, but I do have exclusions in my robots.txt file. Known bad bots or spiders and bots or spiders that disregard the rules or behave badly are banned from my website via my .htaccess file.
Here’s some information on how to create and check a robots.txt file for your website.
The Web Robots Pages
Martijn Koster’s website all about robots.txt and the Robots Exclusion standard.
Put your robots.txt on a diet
How to reduce the file size of your robots.txt file by removing duplications, compressing multiple records, and more. Via Webmasterworld.
The Robots.txt Our Big Crawl
Common problems and errors found after researching 2.4 million URLs and 75,000 Robots.txt files. Great insight so you avoid these problems! Via Webmasterworld.
Robots.txt Validator
Check your robots.txt here with this helpful online tool. Via SearchEngineWorld.
Which Bots or User Agents are Good or Bad?
Bots, Blogs and News Aggregators Presentation Sources and White Paper
By Marcus P. Zillman, M.S., A.M.H.A.
Information Retrieval Software
A website devoted to providing information about information retrieval software (including email scrapers, spambots, etc.), search engine robots, and more.
List of Bad Bots
Helpful information here on quite a few user agents, including what type of bot, user agent strings, IP addresses, links to more details, and more. Well done. By Ralf D. Kloth, via kloth.net.
List of User-Agents (Spiders, Robots, Crawler, Browser)
Hundreds listed in these helpful charts that include type of user agent, descriptions and links to information about hundreds of spiders, robots, crawlers, and browsers. Types include: (Client) browser, Link-, bookmark-, server- checking; Downloading tool; Proxy server, web filtering; Robot, crawler, spider; Spam or bad bot. By Andreas Staeding, via psychedlix.com.
Project Honey Pot Statistics: Top Spam Harvester User Agents
Listings by type of user agent, including the page linked here. You’ll also find Robot User Agents, currently active Top 25 Global Spam Harvester List, and more. Via projecthoneypot.org.
RSS user agent identifiers
A helpful list of RSS user agents categorized by Web aggregators and search engines, Desktop readers and aggregators, RSS tools and services. By Philip Shaw, via Code Style.
Search Engine Spider Identification: Ultimate short list of banned bots
Includes helpful links for ways to fend off bad bots and spammers. Via Webmasterworld.
Search Engine Robots
Fabulous listings here with descriptions and links. The categorizes include: Search engine robots and others, Browsers, Link Checkers, Link monitors and bookmark managers, Validators, FTP clients and download managers, Research projects, Software packages, Offline browsers and other agents, Other miscellaneous agents, Sites that regularly visit, Other useful sites, some fakers. By John A. Fotheringham, via jafsoft.com.
Statistics
An informative and helpful post about stats logs (primarily AWStats) handling xml feeds and how to sort them out to get a better view of good guys and bad guys. By Tomas Jogin, via Jogin.com, June 15, 2004.
System: User Agents
A searchable directory of user agent strings that includes their source, purpose, links to more information for most of them, and you can search their database or paste a user agent string into a form there, too. Fantastic and helpful features. Provided by The Art of Web.
User Agent Strings
Helpful table of user agents with descriptions, and includes opinion of whether they’re legitimate or not, good or bad, etc., and has links to more info. Via 50by50.com.
HTTP Error Codes
Most of us probably know what a 404 error is (page not found), but there are lots more server-side error codes. You can create custom error pages with more helpful error messages, adding rules for them within your .htaccess files if you wish, such as a custom 404 message. You can view this website’s custom 404 error message to see what I mean. Here are some helpful sources for more information about error codes.
W3C: Hypertext Transfer Protocol—HTTP/1.1
The definitive source of HTTP error codes via W3C.
HTTP Error Codes and what they mean
Well done, thorough, and helpful list and easy-to-understand descriptions of HTTP error codes. Via wats.ca.
Apache HTTP Status Codes and the Apache Redirect Directive
Article by Gez Lemon via JuicyStudio, May 25, 2005.
Server Vulnerabilities
Chapter 2. Security For Administrators
Interesting article about how to track down who’s attacking your server and what to do about it.
You can follow any responses to this entry through the RSS 2.0 feed. and leave a response, or trackback at the following address from your own site.
http://yanuar.kutakutik.or.id/.....trackback/
2 Responses to “Belajar .htaccessâ€
Joseph Spammolino
2006/05/16 – 6:09 am
What is spam?
Joseph Spammolino
2006/05/16 – 6:10 am
KARTHAGO DELENDA EST SEMPER VBI SVB VBI SIC TRANSIT GLORIA MVNDI QVID ME VEXARI
Leave a Reply
Name (required)
Mail (will not be published) (required)
Website
XHTML: You can use these tags:
--------------------------------------------------------------------------------
Entries (RSS) and Comments (RSS). powered by WordPress
This blog is protected by Spam Karma 2: 3803 Spams eaten and counting...
Reply
May 16th, 2006
code>wHAT IS SPAM? Wherefore?
Aha!
Reply
May 16th, 2006
Reply
February 6th, 2007
By the by your blog is nice with confident things…..with lot of good things.Good luck!
Don Lapre Albert
webmaster@donlaprewilliams.com
http://www.donlaprewilliams.com
Reply
May 13th, 2007
Hi My Name Is ivawfb.
Reply
January 10th, 2009
Can you provide more information on this please.
Reply
February 28th, 2009
kenapa semua berbahasa ingris nggak ada yg bahasa indonesia?
from
DokterMatrix.com
Reply
February 28th, 2009
kenapa serba berbahasa inggris
Reply
May 29th, 2009
Waduh.. pake bhs. inggris…
kapan2 lihat blogku ya…
StenlyTW
Reply
July 2nd, 2009
Thank you for information
Reply
July 3rd, 2009
Thank you for information
Reply
July 21st, 2009
very great article and alot of useful information
plus you have done a good job in your blog its neat.
any tips/tricks via htaccess to remove (.html) from urls?
thanks in advance
Reply
October 7th, 2009
Thanks this is so helpful for newbie like me.
Ini sangat membantu sekali mas. trims
Reply
July 28th, 2010
Great share dude Thank you
Reply
July 28th, 2010
Thank you, interesting find
Reply
August 2nd, 2010
Great article on this subject. Bookedmarked your site to come back for your next post.
Reply
Add a comment