Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Apache

The Apache HTTP Server, commonly referred to as Apache, is a web server application notable for playing a key role in the initial growth of the World Wide Web. Apache supports a variety of features, many implemented as compiled modules which extend the core functionality. These can range from server-side programming language support to authentication schemes. Some common language interfaces support Perl, Python, Tcl, and PHP. Popular authentication modules include mod_access, mod_auth, mod_digest, and mod_auth_digest, the successor to mod_digest. A sample of other features include Secure Sockets Layer and Transport Layer Security support (mod_ssl), aproxy module (mod_proxy), a URL rewriter (mod_rewrite), custom log files (mod_log_config), and filtering support (mod_include and mod_ext_filter). Apache is controlled by a series of configuration files: httpd.conf, access.conf. and srm.conf (there's actually also a mime.types file, but you have to deal with that only when you're adding or removing MIME types from your server, which shouldn't be too often). The files contain instructions, called directives that tell Apache how to run. The typical Apache user has to maintain three different configuration files--httpd.conf, access.conf, and srm.conf. These files contain the directives to control Apache's behavior. If you have a simple enough configuration or you just want the convenience of editing a single file, then you can place all the configuration directives in one file. That one file should be httpd.conf, since it is the first configuration file that Apache interprets. You'll have to include the following directives in httpd.conf: AccessConfig/dev/null ResourceConfig /dev/null Restrict access: Say you have document directories or files on your Web server that should be visible only to a select group of computers. One way to protect those pages is by using host-based authentication. In your access.conf file, you would add something like this:

<Directory /usr/local/apache/share/htdocs/protected> order deny,allow deny from all allow from 10.10.64 </Directory> The <Directory> directive is what's called a sectional directive. It encloses a group of directives that apply to the specified directory. The Apache Quick Reference Card includes a listing of sectional directives. The above case allows only computers with an IP address starting with 10.10.64 to access the pages in the given directory. You can use the complete IP address, an IP range as shown here, or even use the DNS names. For example, to allow only CNET computers access to a specific file, you might do this in your access.conf file: <Location/usr/local/apache/share/htdocs/company/employees.html> order deny,allow denyfromall allowfrom.cnet.com </Location> An interesting side-effect of host-based authentication is that if you're using a browser on the Web server machine itself and attempt to access the page through localhost, you'll be denied permission. That's because the localhost IP, 127.0.0.1, will not be in the .cnet.com range. You can easily add localhost to the permission list by putting the appropriate IP on the allow directive: allow from .cnet.com 127.0.0.1
Customize error messages:

If a user requests a page that doesn't exist or is in a protected directory, Apache returns one of its built-in error messages that say things like Forbidden or Not Found. That's accurate, but not very informative. You may want to give your users more guidance as to what they did wrong,

provide an alternative URL to get them back in your site, or at least offer an error page that fits in with your overall site design. With a bit of editing, you can make Apache return a custom error page or run a script to handle the error.

Open the srm.conf file and insert the following: ErrorDocument 404 /error.html Your server will now return the error.html page whenever a user requests a page that doesn't exist .
Support multiple languages:

HTTP 1.1 formally specified a feature called content negotiation, which had actually been around for awhile in experimental servers, including early versions of Apache. It's a way to present documents in different languages and formats based on a user's browser configuration. For example, suppose you're a Canadian company that needs to serve both French and English versions of your Web site. First, you must enable the feature by adding the appropriate directive to your access.conf file. Open the access.conf file and find or create the appropriate <Directory> entry where you plan to store the multilanguage pages. Then add the Options MultiViews directive to that section. Remember that Options All does not actually mean all--it doesn't turn on MultiViews support. So you must explicitly declare your intention to use MultiViews. For example: <Directory /usr/local/apache/share/htdocs/multi> Options MultiViews </Directory> Next, you need to edit your srm.conf file to include the languages you want to support and the file extensions associated with each language. The Canadian example calls for English and

French, which have the standard identifiers en and fr, respectively. Your srm.conf file should already have these, but if not, add the appropriate lines: AddLanguage en .en AddLanguage fr .fr LanguagePriority en fr The LanguagePriority directive is used when there's a tie during content negotiation. For example, if Apache can't tell whether the browser prefers English or French,

the LanguagePriority directive tells Apache to serve the English version of the page. Configure for server-side includes: If you want to take a small step beyond static HTML pages, but you aren't quite ready to dive into writing your own Perl scripts, then you should try server-side includes (SSI). With SSI turned on, Apache will preparse certain HTML files before sending them out, looking for special embedded commands. These commands allow you to do basic things like include the contents from another file or print out an environment variable. To enable it, you first need to make sure it has been compiled into your version of Apache. Go to the directory where your httpd executable resides, typically /usr/local/apache/sbin, and type./httpd -l. That should return a list of all the modules included in your build of Apache. Hopefully mod_include.c is in that list. If not, you'll have to rerun the build of Apache, editing the comment code from the mod_include in the Configuration.tmpl file. Once you've determined that mod_include is available, you have to allow the execution of includes and map an appropriate filetype. As with all things Apache, there are about a gazillion ways to do this. Probably the easiest is to enable all the options in one place in your access.conf file:

<Directory /usr/local/apache/share/htdocs/include> Options +Includes AddType text/html .shtml AddHandler server-parsed .shtml </Directory> Configuring Apache for CGI: If you've pushed server-side includes about as far as they can go, you might want to try common gateway interface (CGI) scripts. CGI is a standard way for Web servers to interact with other programs running on your computer. CGI scripts are usually written in Unix shell commands or with a scripting language such as Perl. Configuring Apache to run CGI programs isn't that hard. First, you need to assign an alias for your script directory. You never want the directory containing CGI scripts to actually reside within the normal document root of the server because an intruder could get access and run their own scripts. So you create a special location, called an alias, to the actual CGI directory. Edit your httpd.conf file and add the line below: ScriptAlias /cgi-bin/ /usr/local/apache/share/cgi-bin/ To actually execute programs, you need to edit the access.conf file by adding a section like this: <Directory /usr/local/apache/share/cgi-bin> Options ExecCGI AddHandler cgi-script .cgi .pl </Directory>

Limit DNS overhead: To improve Apache's performance, when restricting access with allow from or deny from, use IP addresses where possible to limit the number of DNS lookups. Apache has to run a double lookup when using an allow from domain name or deny from domain name directive--a reverse to resolve the browser's IP address into a domain name followed by a forward to make sure that the reverse is not being spoofed. You can limit your DNS lookup overhead even further by restricting lookup to only the files you need hostname lookups on, such as HTML or CGI. To do that, add something like this in your configuration files: HostnameLookups off <Files ~ "\.(html|cgi)$> HostnameLookups on </Files> Check the timeout: Even relatively simple Web pages can have a number of pieces. Previously, a browser had to set up a new connection to the Web server to retrieve each piece--a connection to retrieve the HTML and separate connections for each GIF. One page with three images would require four connections. That's kind of expensive in network traffic, and can really slow things down. HTTP 1.1 added a new feature called keep-alive. This lets a Web server keep a connection open so the browser can send down multiple requests without having to set up a new connection for each one. In Apache, keep-alives are controlled by three directives in

httpd.conf:KeepAlive, MaxKeepAliveRequests, and KeepAliveTimeout. The KeepAlive directive determines whether to activate the KeepAlive feature, whileMaxKeepAliveRequests determines how many requests the server will allow from a browser during a single connection. And KeepAliveTimeout determines how long the server will keep the

connection open waiting for additional requests. So to turn on keep-alives, and allow for 100 requests with a 15 second timeout, add the following lines to httpd.conf: KeepAlive On MaxKeepAliveRequests 100 KeepAliveTimeout 15

Block bad robots: Robots are programs that automatically download pages from your Web site. A wellbehaved robot is supposed to read your robots.txt file to determine how to crawl your site. But illbehaved robots may ignore the file, potentially distorting your Web site traffic and ad reports as well as stealing your network bandwidth and slowing down your Web server. If you know the robot's IP address, you can use the Apache Deny directive to restrict Web access from that IP address. For something more powerful, use the Apache mod_rewrite module. It may not be part of your default Apache configuration--you can check using the ./httpd -l command. If it's not there, you'll have to edit the Configuration.tmpl file and recompile Apache. Once you've installed mod_rewrite, you can use it to restrict access to your server based on any server or environment variable, including IP address, robot agent name, and time of day. For example, adding the following directive to one of your configuration files blocks all access from any robot with the keyword "NameOfBadRobot" in the HTTP user agent: RewriteCond %{HTTP_USER_AGENT} ^NameOfBadRobot.* RewriteRule ^/.* - [F]

Diagnose your server: The mod_status and mod_info modules let you analyze and debug your Web server from a browser. First, make sure the modules are compiled in your version of Apache. Then activate the modules and control access to this information. The mod_status module gives you comprehensive Web server diagnostics such as uptime and downtime, requests, CPU usage, and so forth. (Note: using this module requires that Apache be running in standalone mode, not as an inetd server.) Add the following to your access.conf file: <Location /status> SetHandler server-status <Limit GET> order deny,allow deny from all allow from .cnet.com </Limit> </Location>

You might also like