Belajar .htaccess

Segala tentang .htaccess ada disini.
spt yang dibilang bahwa I’m not a programmer or server-side specialist. jd mungkin perlu sedikit mengingat-ingat mengenai seluk beluk .htaccess ini.

First, A Word of Warning

Keep in mind that one little typo or incorrect rule within an .htaccess
file can cause an internal server error and take your entire website offline.
Especially if you’re new to using an .htaccess file, I highly recommend setting
up a test directory to work on your .htaccess file. In addition, always make a
backup of your .htaccess file before making any changes. That way, if you do
happen to make a typo or other error, you can load your backup file again to
keep your website up and running while you look for the source of the problem(s).

In addition, many caution those new to .htaccess about not getting too
carried away and ending up creating excessively big .htaccess files. Keep in
mind that the server will process this file for each request at your website, so
you don’t want to negatively impact your server’s performance. For those with
access to the

httpd.conf
file on your Apache server, many recommend using that instead of
.htaccess, especially for better server performance. Many of us on shared
servers, though, don’t have access to it, including myself.

I prefer to think of .htaccess as just one of a variety of approaches and
tools for managing URLs (especially URL redirecting), managing custom error
pages, and combating bad bots and spammers. It’s a fantastic tool that I’m
thrilled to be able to use for my own websites finally, including this one.

Regarding combating bad bots and spammers, .htaccess is one of several
tools and approaches that I use. My goal is to keep things simple and block the
bad guys without blocking everyone else. No one single approach can do it all,
though, and bad bots and spammers continually work on ways to get past all the
blocking approaches discussed online. So far I’m able to block nearly all
of the bad bots and spammers, but new ones always come along, so I watch my logs
closely, too.

On to some website links that I’ve found especially helpful.

Apache Documentation

First, here are several links to the definitive source for
Apache 1.3 and Apache 2.0 specifically related to using .htaccess, especially
for redirecting URLs and blocking bad bots and spammers.

Apache 1.3
Apache 2.0

How to Use .htaccess, mod_rewrite, and Related (for Apache)

.htaccess Tools

I’ve been scouring the Internet looking for tools that will
check .htaccess files for typos or other potential problems. So far I haven’t
found anything, although I did find some tools that will help you create .htacess
rules and test user agent strings. They’re listed below.

Tools to Generate .htaccess Rules

Try one of these tools to generate redirects, hotlink
protection, password protection, or blocking bad bots. At the minimum, you can
try them out as learning tools to see how something might be handled. Note that
they might not do very complex rules.

Tools to Test .htaccess Rules
  • WannaBrowser
    Wanna Browser is a helpful test to see if your rewrite rules are working as
    you wish for user agents. If you’ve blocked a certain user agent string for a
    bad bot, for example, you can see if your rule is working properly with their
    online tool.

Forums Devoted to Apache .htaccess, mod_rewrite, mod_setenvif, and Related

You’ll find enormously helpful tips and troubleshooting help
for using .htaccess, mod_rewrite, mod_setenvif, and related Apache features via
these forums. You don’t need to subscribe to read most discussions, although
you’ll need to sign up to post your questions or comments, and Webmasterworld
Forums has subscriber-only areas in addition to their freely available areas.

  • .htaccess Tools Forum

    Helpful forum devoted to all things .htaccess via the .htaccess Tools website.
  • mod-rewrite.com
    Website has a forum all about using mod_rewrite, such as URL handling, access
    restriction, regular expressions, and more.
  • SitePoint
    Forums: Apache

    SitePoint’s Apache forum is a busy one, filled with lots of tips, examples,
    resources, and more.
  • Webmasterworld Forums:
    Apache Web Server

    This top-notch forum covers .htaccess, mod_rewrite, and other Apache topics.
    I’ve found countless tips and insight here. Be sure to check out their

    Charter – Apache Web Server
    , too, as you’ll find helpful resources there
    in addition to reviewing their rules for posting.

Using .htaccess to Block Hotlinking, Stop Bandwidth Theft

I absolutely love the availability of preventing other websites
from directly linking to my server’s images, CSS, JavaScript, etc. using .htaccess.
Here are a couple of tutorials on how to do it.


  • Preventing image hotlinking: An improved tutorial

    By Tom Sherman, via Underscorebleach.net, November 21, 2004 (updated September
    14, 2005.)
  • Smarter Image
    Hotlinking Prevention

    Prevent others from directly linking to your server’s images, CSS, JavaScript,
    and other files. By Thomas Scott, via A List Apart, July 13, 2004.
  • Stop Hotlinking and
    Bandwidth Theft with .htaccess

    Helpful, easy-to-understand tutorial at altlab.com. The approach used in this
    tutorial is basically what I do for my websites, as I’ve also chosen to send a
    Forbidden (403) error message.
  • URL
    Hotlink Checker

    You can test the effectiveness of your website’s hotlink protection with this
    online tool by entering a complete URL from your website to see if your image
    can be loaded and hotlinked by a remote server. Via altlab.com.

Note that you might wish to allow certain sites to directly link to a specific
image, such as an icon image for your newsfeeds, while still not allowing hotlinking
to all your other images. I recently added my newsfeeds-related icon image to
a separate directory, and in that directory’s .htaccess file I’ve specified
a rule using Apache’s
directive
to allow hotlinking to that specific image only. I’m currently
testing that to see how it goes for the next few weeks. I prefer that people
download the icon to use from their own servers, so if I find other websites
abusing the hotlinking for that image, it’s easy enough to individually prevent
them from hotlinking to it and make more restrictive rules within that separate
directory’s .htaccess file.

Using .htaccess to Ban Bad Bots and Spammers

Note that some of the Webmasterworld forum links might require
a subscription.

Some helpful forum threads:

Weblogs, Wikis, Sites, Sections Devoted to Combating Bad Bots, Spammers

  • Chongqed
    Manni’s weblog (Manfred Heumann) devoted to hunting down and sharing spammer
    information, wiki spam, email spam, and life in general.
  • chongqed.org
    Another invaluable weblog and wiki devoted to hunting down spammers and
    sharing info with everyone to fight wiki spam, blog spam, and guestbook spam.
    Run by Joe (from Texas) and Manni (Manfred Heumann).
  • Spam Chongqing
    Joe’s (from Texas) weblog devoted to hunting down and sharing spammer
    information.
  • Spam Huntress
    An invaluable weblog and wiki devoted to hunting down spammers, sharing info
    with everyone to help combat spam and block spammers from your websites.
  • Spam Kings Blog
    News and information about catching, prosecuting spammers covering topics from
    the book, by Brian McWilliams.
  • Tom Raftery’s I.T.
    Views: .htaccess Category

    Tom’s site is also quite helpful with strategies, tips, and links to combat
    spammers and bat bots.

Thoughts on Dealing with Comment, Referral, Trackback Spam

As I mentioned above, no one approach will be totally effective
or even practical in blocking comment spam, referral spam, or trackback spam.
Blocking by IP address or host can quickly become impractical, as anyone knows
who’s tried to block solely by IP address. Your ban list will grow rapidly, IPs
get outdated just as fast, and IPs often

come from zombie machines
. Blocking by user agent can help, but spammers
spoof user agents and you don’t want to block legitimate users. There are known
spoofed user agent strings that you can add to your ban list, though, which can
help quite a bit. Blocking by referrer can be helpful, but once again your ban
list will grow quickly, too, similar to IP lists. Blocking by keywords for
referrers and hosts can help cover most spam referrals and hosts, but I’ve also
recently found spammers trying more legitimate-looking domain names. Keep in
mind that spammers are always coming up with new ways to get around blocking
approaches, too.

Largely for these reasons I’ve found it most effective for my own websites to
use a combination of several approaches and tools. Each of my websites is
different, though, so I don’t do the same things at each site, although there is
certainly some overlap.

Here are some helpful articles on ideas and ways of helping to combat the
spammers.

Regular Expressions

Learning even just a little about regular expressions can be
valuably helpful. Learning more about regular expressions can go a long way with
writing leaner mod_rewrite rules and other rules for your .htaccess files.

Robots.txt

Unfortunately, many bots disregard or don’t even look at your
robots.txt file. Good ones will, though, and it’s worth creating, even if the
bad bots ignore or don’t even look at it.

For my own websites, as long as the bot or spider behaves itself properly, I
typically allow it, but I do have exclusions in my robots.txt file. Known bad
bots or spiders and bots or spiders that disregard the rules or behave badly are
banned from my website via my .htaccess file.

Here’s some information on how to create and check a robots.txt file for your
website.

  • The Web Robots Pages

    Martijn Koster’s website all about robots.txt and the Robots Exclusion
    standard.
  • Put your robots.txt
    on a diet

    How to reduce the file size of your robots.txt file by removing duplications,
    compressing multiple records, and more. Via Webmasterworld.
  • The
    Robots.txt Our Big Crawl

    Common problems and errors found after researching 2.4 million URLs and 75,000
    Robots.txt files. Great insight so you avoid these problems! Via
    Webmasterworld.

  • Robots.txt Validator

    Check your robots.txt here with this helpful online tool. Via
    SearchEngineWorld.

Which Bots or User Agents are Good or Bad?

  • Bots, Blogs and News Aggregators
    Presentation Sources and White Paper

    By Marcus P. Zillman, M.S., A.M.H.A.
  • Information Retrieval Software
    A website devoted to providing information about information retrieval
    software (including email scrapers, spambots, etc.), search engine robots, and
    more.
  • List of Bad Bots

    Helpful information here on quite a few user agents, including what type of
    bot, user agent strings, IP addresses, links to more details, and more. Well
    done. By Ralf D. Kloth, via kloth.net.
  • List of User-Agents
    (Spiders, Robots, Crawler, Browser)

    Hundreds listed in these helpful charts that include type of user agent,
    descriptions and links to information about hundreds of spiders, robots,
    crawlers, and browsers. Types include: (Client) browser, Link-, bookmark-,
    server- checking; Downloading tool; Proxy server, web filtering; Robot,
    crawler, spider; Spam or bad bot. By Andreas Staeding, via psychedlix.com.
  • Project
    Honey Pot Statistics: Top Spam Harvester User Agents

    Listings by type of user agent, including the page linked here. You’ll also
    find Robot User Agents, currently active Top 25 Global Spam Harvester List,
    and more. Via projecthoneypot.org.
  • RSS user
    agent identifiers

    A helpful list of RSS user agents categorized by Web aggregators and search
    engines, Desktop readers and aggregators, RSS tools and services. By Philip
    Shaw, via Code Style.
  • Search
    Engine Spider Identification: Ultimate short list of banned bots

    Includes helpful links for ways to fend off bad bots and spammers. Via
    Webmasterworld.
  • Search Engine
    Robots

    Fabulous listings here with descriptions and links. The categorizes include:
    Search engine robots and others, Browsers, Link Checkers, Link monitors and
    bookmark managers, Validators, FTP clients and download managers, Research
    projects, Software packages, Offline browsers and other agents, Other
    miscellaneous agents, Sites that regularly visit, Other useful sites, some
    fakers. By John A. Fotheringham, via jafsoft.com.

  • Statistics

    An informative and helpful post about stats logs (primarily AWStats) handling
    xml feeds and how to sort them out to get a better view of good guys and bad
    guys. By Tomas Jogin, via Jogin.com, June 15, 2004.
  • System: User Agents

    A searchable directory of user agent strings that includes their source,
    purpose, links to more information for most of them, and you can search their
    database or paste a user agent string into a form there, too. Fantastic and
    helpful features. Provided by The Art of Web.
  • User Agent Strings
    Helpful table of user agents with descriptions, and includes opinion of
    whether they’re legitimate or not, good or bad, etc., and has links to more
    info. Via 50by50.com.

HTTP Error Codes

Most of us probably know what a 404 error is (page not found),
but there are lots more server-side error codes. You can create custom error
pages with more helpful error messages, adding rules for them within your .htaccess
files if you wish, such as a custom 404 message. You can view
this website’s custom 404
error message
to see what I mean. Here are some helpful sources for more
information about error codes.

Server Vulnerabilities

About Me My Self

Hanya seseorang yang belajar untuk mencari sesuatu yang mungkin akan menjadikannya sesuatu yang akan berguna untuk suatu hari nanti.

16. January 2006 by Me My Self
Categories: Another Stuff, nGeWeb | 28 comments

Comments (28)

  1. Pingback: frågor om blekning av tänder

Leave a Reply

Required fields are marked *


This site uses Akismet to reduce spam. Learn how your comment data is processed.