|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Apache Course Notes | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
An Introductory course
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Option |
What it does |
Example |
|
ServerType |
Sets up how the server will be started. Always use… |
standalone |
|
ServerRoot |
Where the configuration files will be found |
e:/Program Files/Apache Group/Apache |
|
Listen |
The TCP port number the server will listen on for requests. Useful if you need to run multiple servers on a single machine |
80 |
|
ServerAdmin |
The email address of the person that gets all the flak when things blow up |
santa@x-mas.org |
|
ServerName |
The name by which the server is known |
blitzen.x-mas.org |
|
DocumentRoot |
Really important. The start of the directory tree where the HTML pages will reside |
/home/httpd/htdocs |
|
<Directory "/home/httpd/htdocs"> |
The start of a section that defines how the Document Root directory security and other options apply to site |
|
|
AccessFileName |
The name of the file that defines how security is implemented in each folder. If the file of this name is found in a directory then its parameters are used instead of any others |
.htaccess |
|
HostnameLookups |
This controls whether the server tries to resolve the DNS names for the IP addresses that are making requests. Leave it off if you value time, your clients time that is. |
off |
|
LogLevel |
If you want to fill the logs with messages this is your way to do it. Changing this value through the settings will increase or decrease the verbosity of the servers log output. |
warn |
|
IndexOptions |
If you so set up the server, when there are no HTML files with the correct default name, a directory listing is created. I prefer to switch this feature off, not wanting folks to amble round my system freely, though you get a 404 error. |
FancyIndexing |
|
<Location … > |
This is a block of configuration, commented out by default, that establishes a mechanism for checking on the status of the server. Works through mod_info.c |
/server-status |
|
<VirtualHost … > |
This is another block command that allows the setting up of additional web sites on the same server. This is called Virtual hosting. |
www.theelves.com |
Having messed with the httpd.conf file, or any of the other files, you have two options, start the server and hope, or check the config files to see if they are OK. The later method is done by using
/usr/sbin/httpd -T (Linux)
or
apache -T (Windows)
This will either show where the errors are, or display the message "Syntax OK."
If you have errors, fix them, normally the server will not start if there are configuration issues.
Virtual hosting is a method of allowing the web server to play host to multiple web sites. These sites may have different names, or different IP addresses or both. The sites may also be on different ports. Most web hosting companies use this method to rapidly and efficiently host many web sites on a single server.
The central part to setting up a Virtual host is understanding how the virtual host blocks function in the configuration files. Any parameters that have already been declared in the configuration file that are not overridden by a similar configuration option in the virtual hosts blocks carry through. In more recent versions of Apache, you will notice in the configuration file (httpd.conf) that at some point it states that any parameter from that point can be used both in and out of the virtual host blocks. This is important as any setting you make from this point in the configuration file, that lies outside a virtual host block will be in effect for the virtual hosts, unless the virtual host block alters it.
The NameVirtualHost parameter is the default name or IP for the server, this allows you to catch browsers that don't support HTTP/1.1, or don't specify a domain name (just an IP address.
The name of each virtual host can be an IP address or a domain name. If the browser asks for a domain name and the virtual hosts are IP addresses then the system does a reverse look up on a DNS, then jumps to the virtual host entry. A) The domain name and the virtual hosts are DNS names (www.name.com) the system jumps to the virtual host block and goes from there B) The address and the virtual host are IP addresses (a.b.c.d) the system jumps to the virtual host block and goes from there an IP address and the virtual hosts are domain names the system does an IP look up on a DNS then jumps to the virtual host entry By IP address, request consider a browser user doing: http://123.123.123.010/santasdirtysecrets.html By domain request consider a browser user doing: httpd://www.x-mas.org/santadirtysecrets.htm
# Virtual Hosts blocks NameVirtualHost 123.123.123.010 <VirtualHost www.x-mas.org> ServerName www.x-mas.org ServerAdmin santa@x-mas.org DocumentRoot /opt/www/x-mas/htdocs ErrorLog /var/log/httpd/x-mas/error_log TransferLog /var/log/httpd/x-mas/transfer_log </VirtualHost> <VirtualHost wishes.x-mas.org> ServerName wishes.x-mas.org ServerAdmin youhope@x-mas.org DocumentRoot /opt/www/wish/htdocs ErrorLog /var/log/httpd/x-mas/wishes.x-mas.org-error_log TransferLog /var/log/httpd/x-mas/wishes.x-mas.org-transfer_log </VirtualHost> <VirtualHost www.theelves.com> ServerName www.theelves.com ServerAdmin lordhighelf@ theelves.com DocumentRoot /opt/www/theelves/htdocs ErrorLog /var/log/httpd/theelves.com-error_log TransferLog /var/log/httpd/x-mas/theelves.com--transfer_log </VirtualHost>
This would set up three virtual hosts, all named and all operating on their own DNS registration names. It is also possible to change all the <VirtualHost …> entries to be IP addresses. This may have its advantages, as some older browsers cannot handle the HTTP/1.1 feature that enables virtual hosting. If a browser cannot handle the virtual naming it will fall through to the DocumentRoot that has been specified for the whole server (should such an option exist), or to the Virtual host block specified by NameVirtualHost.
Lastly, depending on the setup of your server, you will likely need to
check that your /etc/hosts (d:\winnt\system32\drivers\etc\hosts) file has
entries for each of your virtual domains. If the DNS you are talking to
can quickly resolve the names, all well and good, but incase things are
slow, add entries like:
123.123.123.010 x-mas.org theleves.com
to your HOST's file.
Server side includes (SSI) are a feature of the server that allow pages containing detail needed by other pages to be integrated into the output by the server. SSI is useful to page designers as it cuts down on the material they have to create, all pages could for instance have the same heading and endings without the need to create every page with this information.
The mod_include module is generally responsible for server side includes, and in more recent distributions is enabled by default.
general introduction
./configure
needs
options
If you need to hide the server, make it not reply with its version, then you need to alter that in the compile process.
Apache is covered by an Open Source License, one aspect of this is that the source code is freely available. This means you can alter it, recompile it with new options or just poke about in it. In general, it would be unwise to alter it as you would loose any semblance of support offered by the community, unless you manage to do something pretty great and get it accepted as a new feature in the code base. But compiling it is a very real option, one that comes up far more frequently than you might think.
Versions of Apache are release in most Linux distributions, several versions of Unix and on other platforms, too. Some of these installations are OEM defaults, as in RedHat and even Solaris. The version in any particular distribution is likely to be out-of-date by the time you load the CD. New versions are released to fix security holes and bugs so there is a real need to install new versions once in a while.
New versions come in two flavors: source and binary.
If you have a supported OS version and there is an available binary, you might think that would be good enough, not necessarily so. The binary will have been created with various options that may not compare to what came with your original setup, or a version of the server you compiled more recently. Install the binary on a spare machine to make sure that there are no issues before proceeding.
For RedHat RPM distributions:
rpm -e apache.yourlast-version Uninstall old
rpm -i apache.new-version.rpm Install new
For the tar ball installs
tar -xzvf apache.new-version.tar.gz
(x = expand, z = use gzip, v = be verbose, f = use the following file)
ZIP, EXE or MSI installs in Windows
For these installs execute an EXE install, expand a ZIP install and execute the setup if its not done automatically, or right click on the MSI file and select Install from the menu.
Once you have dug around to find your new apache check it to see if it has what you need. The version check is useful to determine what options were used in its creation. If you can get it running, the /server-info function will also identify which modules have been built in to the binary. Alter the configuration files to add any of the features you had installed in your old system, record everything you did, and re do it on the main server.
Most Operating System releases only come in a source form, which though it is intimidating is not as bad as it first appears. So many before us have had to compile this and other Open Source projects on so many occasions that the process for doing the compile is now really, fairly straight forward.
You will get a tar file or a ZIP file of the source, this needs to be extracted and placed somewhere. There are many places this can be done, in your home directory is one, or in a common location. Remember that you are going to create a replacement web server, so you probably don't want to do this on the working, production, server box.
For Linux you will need to have installed the development packages, giving you the gcc compiler and its various libraries. On windows, the readme suggest that you can use Microsoft C++ and some Borland compilers (not yet the free command line BC 5), but I think the system is set up to make use of the free GNU C compiler.
When you install the source it will create a number of directories that contain the source and various Make files.
The compile is a three step process:
Configure the options you want
Make the system with these options
Create the install
There is a way of generating the install in such a manner that it is easier to build your own install ZIP or tar (see the Make Install section below).
To configure the install you need to specify the parameter you want to change and the value you want to change it to. The documentation lists a significant number of the settings.
Use ./configure --help to list all the options the interface can help you with, it’s a long list but worth chugging through. Additionally, a help file called "INSTALL" has more information on doing compile configuration, not all of it seems up to date - I have in mind here references to PHP.
Alternatively, you can edit one of the header files (include/httpd.h) and make gross changes to the configuration from there, I suspect that this is frowned upon in high Apache circles, but it is one of the few places that allows you all options permanently
There really isn't a great deal to do here, except sit back and bite your nails. If there are errors you will find out rapidly. If you do get errors, there is normally a good reason for it, remember the chances that you are the first user to ever use a certain command parameter on the Operating System are small to non-existent, assume you are wrong before sending flame mail to an Apache newsgroup.
One issue I have here is that on pressing enter after this innocuous command anything you already have installed is blown away by the new install - Blam! just like that. You should always have backups, as they say.
If you want to offset the whole install so that it can be tarred or zipped, to be placed on another system, you can override the install processing using something like the following:
make install-quiet root=/tmp/apache_root/
This creates the system but with all the files now relative to the directory you specified.
You may notice that there are several extra executables created, these included in my instance apachectl a program that allows you to test the initialization of Apache in a number of ways. Also "suexec," log handlers and the programs that create the security password files for use with .htaccess (or equivalents).
installing
configuring
status
Apache is designed to allow add on modules to be included in two distinct ways:
Built-in (static, compiled in -- SO)Some add-ons allow either of these two methods to be implemented, while other add-ons recommend one manner over the other. The only way to find out the preferred method is to read the documentation for the particular add-on. If you intended to add many static modules over a period of time you will become very familiar with rebuilding apache with an ever growing list of parameters. Most, though not all, modules are written in C. Apache has a large API that developers can use to build modules that serve to fix various problems (like mod_speling [sic] that allows the server to make guesses at misspelt URL's or page names a user is asking for).
In general, adding a new module will have the follow stages:
Download the code (few are binaries)
Determine the best mode of operation (compiled in or dynamic)
Move code to a suitable location (often dictated by documentation)
Either,
Set Apache parameters and recompile, including the new module
Re-install Apache with the new module
Or
Compile the module
Install new dynamic module to appropriate location
Restart Apache
Test Apache to make sure it still works
Add some test pages to exercise module
Test module
Notify whomever that the module is now in place and ready for use
running
error_log
access_log
referer_log
agent_log
transfer_log
Common logs
Analysis and statistical tools
favicon.ico The bane of log readers
tail -f command
log rotation
To run your Apache server is a task that varies by operating system. In Windows the latest version of the install (1.3.17) allows for Apache to be run as a service. You will find after the install the Win2K or NT Services Applet has an entry for Apache, and that after reboot the server will be running.
In Linux, assuming you have a standard distribution, there are frequently
a series of scripts in a directory under /etc/. This directory changes places
by distribution, but in RedHat and SuSE it's in
/etc/rc.d/init.d
The file you are looking for goes variously by the name of "apache"
or "httpd." It is a script file, you can view it by using "more
httpd." The script takes three parameters: Start, Stop and Restart. Restart
will stop then start the server. These scripts do all the necessary look
up to find the process Id's when you are trying to stop the server. Use them
they are very good.
Generally, in Linux, Apache starts up at boot time. You should find a reference to "httpd" in one of files in the start up directories (rc3.d or rc5.d).
If you get errors on startup its generally because you failed to set the directory permissions correctly for the server to get at its files, or the configuration file has an error. Note that to run Apache you have to be root.
The Log files display various details of the operation of the server. Each log file has its own duties and shows its own style of data.
|
Log File |
Displays |
|
error_log |
error messages and things like start up and termination messages |
|
access_log |
IP addresses, access time and the page requested |
|
referer_log |
The place the client came from and the page requested |
|
agent_log |
The type of browser or search engine that accessed the site |
|
transfer_log |
This is a clone of the Access log. |
Should you prefer, the access_log, referer_log and agent_log can all be integrated into one file, if you look at the configuration files you will see an entry for "combined." Uncomment it and put everything in one bucket. Error data still goes to the error_log though.
Using the configuration files you can change the names of all these log files. It's probably not worth it to do this as several statistical analysis tools default to these names.
An error_log entry:
[Sun May 07 14:12:03 2000] [error] [client 192.168.1.36] File does not exist: e:/www/phone.gif
An access_log entry
127.0.0.1 - - [06/Apr/2000:14:10:24 -0700] "GET /default2.htm HTTP/1.1" 200 162
Having got megabytes of log file what do you do with it. To begin with you will likely saunter through them with your eyes glazing over at all the IP addresses. You will probably be asked to say how many pages have been accessed, or are search engines getting to the site. These questions and many more can be sorted out using various log analysis tools. Due to the widespread nature of Apache, there are several tools that you can use, some of these cost money - and generally do a nice job, others are free and vary in quality of output. I have been using a tool called Analog (www.analog.cx).
A word of warning. In the configuration of the server you may be tempted to use the "HostnameLookups" option to cause all those 123.453.231.010 addresses to be resolved to something meaningful (like grinch.northpole.com) and have a clearer idea of who is calling. Unfortunately, while this is not too bad for those people that are coming through some proxy for which there is a DNS entry, those that are accessing you from say, a cable modem or a DSL connection, having a fixed IP not listed on a DNS will cause the requested page to take forever (a few seconds) to be displayed - the DNS requests have to time out before the page is displayed and you get a log entry that is still just an IP number.
I wrote a small program that looks at the log file, and at your leisure - not those of the browsing public, goes off and tries to replace the IP addresses with resolvable names.
If you are lucky, you will get to see a perennial error in your error_log file that involves a file not being found. The file is favicon.ico. It's not your page designers that have gone nuts, it is a person using a Microsoft IE 4.0+ browser that is book marking your fabulous site for future reference. To get ride of the error either create a null file with this name and place in the root of the web site, or get a 32x32 pixel, 256 color GIF image, name it favicon.ico and place it in the root of the web site. The icon so created will show up as an image in the users browser on their book marking of your site.
To some people there is nothing more satisfying than to see activity on
their web site through the constantly growing logs. If you are using NT or
Windows 2000, then you need to install one of the Posix toolkits to do this,
in Linux its just part of the package:
tail -f error_log
It is a simple command, which I prefer to put on the access_log or the referer_log file as these give me more fun; the joyless ones like to know about errors first. This will follow (-f) the growing log files, displaying any new entries almost as they are made.
Log rotation is a maintenance chore that can be automated to an extent. In cron (or using the schedule service and AT command on NT) it is possible to alter how the various logs are treated at the end of some time period. Because the logs can become quite large, it is generally advised to rotate and compress them. This is partially done by one of the utilities that comes with Apache (only on Linux) called logrotate (in /usr/sbin/). This utility switches off Apache for a moment, whips out the log file, renames it and creates a new empty replacement. It is invoked, generally through cron, for which there are scripts on the web you can use to modify the activities of this rotation process, including the compression of the file to save space.
What is it & what does it do?
How is it configured?
The "robots.txt" file is a file that web search engines use to limit their searches. I suppose someone could create an unscrupulous search engine that indexed everything on your site, but in general purveyors of search technology have a hard enough time just indexing the sites they can hit. If you don't have a "robots.txt" file then you will start to see error messages in your error_log file.
Robots.txt is an ASCII text file containing lines for each directory you don't want the search engine spiders to look at.
user-agent: * Disallow: /mydirectory/ Disallow: /my-other-directory/
PHP
Cold Fusion
Perl
CGI
FrontPage Extensions
Security
SSL
port 443
Hiding returns, see compilation
.htaccess
PHP is a scripting language that looks a lot like C. It is implemented as a module that can be either compiled into Apache, or more preferably dynamically added through the APXS interface. PHP is a lightweight engine that runs well with Apache, it is fast and generally seems quite reliable. Other than compiling up the PHP module and copying it to the libraries directory your Apache uses, you will need to modify the httpd.conf file to cause PHP pages to be sent to the engine for interpretation. There are generally two modifications that are made;
load-module Adds the module to the list of loadable modules
AddType Adds a MIME definition that indicates the file extension to be used
You will likely have to compile your PHP installation. The latest version (4.0.6) compiles in a similar manner to Apache with that 3 step process (configure, make, make install), this produces one file (libphp4.so - under Linux). For Windows you will download an already compiled version, simply installing it, and setting the appropriate parameters in the apache configuration files. Note also that PHP under windows has a php.ini file that needs to be tweaked a little.
(in test.php)
<html>
<head>
<title>Test</title>
</head>
<body bgcolor="white">
<?php
for ($i=8; $i<20; $i++)
{
echo "<br><font style=\"font-size:".$i."pt\">Hello</font>";
}
?>
</body>
</html>
Cold Fusion is a product made by Allaire (latterly Macromedia) that is a commercial product filling some of the same space that PHP does. Under Windows, Cold Fusion makes use of IIS but in Linux it can make use of Apache and a few other web servers. Cold Fusion is an interpreted scripting language, its syntax looks more like HTML on steroids than a traditional programming language.
Even for Linux, the Cold Fusion install is entirely binary. There is no compiling to be done, just make sure that you have a supported distribution or the correct libraries.
(in test.cfm)
<html>
<head>
<title>Test</title>
</head>
<body bgcolor="white">
<cfloop index="i" from="8" to="20">
<cfoutput>
<BR>
<FONT style="font-size:#i#pt">Hello</font>
</cfoutput>
</cfloop>
</body>
</html>
While you can run Perl through CGI scripts it is not always the most efficient way of doing it as each CGI process will be forked and the Perl engine invoked for each of these processes. This can lead to resource issues and slow performance. mod_perl is an effort to redress these problems, with it installed there is now only one Perl engine running at any time, this reduces resource consumption and leads to faster start up of the mod_perl application. The library has many features, and add-ons to this add-on include the ability to add ASP (Active Server Pages) functionality to your Apache server
Unlike Perl, mod_perl applications can be directly coded into HTML pages.
The Common Gateway Interface (CGI) was the original means of getting applications to run on web servers, even today you can install most of the additional language features listed here as CGI based tools. CGI makes use of the standard Input/Output channels in the Operating System. To send data to a CGI application data is sent out on STDOUT, it is read by the CGI application on STDIN, and vice-versa. CGI applications are therefore only limited by the ability of the language they are written in to be able to use these standard I/O channels.
To get the most performance possible most CGI applications are written in C or C++. Essentially the CGI application is written as a filter program.
CGI also specifies a means of getting at certain of the servers environment parameters, this allows the CGI application to better understand the environment it is in. Least we not forget: web servers are stateless - they don't remember anything between user actions, therefore your CGI program will have to be sent much data, and return it, if you are to envisage building systems using it.
in HTML file
<html>
<head>
<title>Test CGI</title>
</head>
<body bgcolor="white">
<a href="myapp?name=ThisText">Click Me</a>
</body>
</html>
in C module
include <stdio.h>
void main(int argc, char *argv[])
{
/* Your code goes here */
fprintf(stdout, "%s", argv[1]); /* display the input string */
}
We live in a world in which
1) Microsoft is a major player
2) Not everyone wants to know how to write HTML
This has lead to a line of product by Microsoft called FrontPage. It is a
web site creation tool. It's got many fancy bells and whistles, including
the ability to use pre-built functions (hover buttons for instance) and
download web sites to the web server. This last function is easy to
implement on NT or Win2K but a trifle harder on Linux. Note, I am not sure
if there is a way to set up FrontPage on an Apache server that is hosted on
a Windows machine - I think MS expects it all to be done through IIS - but
you could always try.
There seem to be several ways around allowing your users or clients to make use of FrontPage (other than telling them to use FTP and learning to code the way Real Geeks do). Firstly Microsoft has a web page and various downloads to implement FrontPage on Unix and Linux servers.
http://msdn.microsoft.com/workshop/languages/fp/2000/unixfpse.asp
There is an installation shell script, a patch script, and a 14Mb install file.
Secondly, there is an organization called RtR (http://www.rtr.com/fpsupport/) that seems to have a FrontPage extensions add-in that is independent of MS (but then again it may not be).
Thirdly, there are various links around the web for FrontPage modules for Apache. How these work is left to you to determine.
The Microsoft version of the FrontPage Extensions installs and updates Apache. In older versions of Apache (1.3.6 for instance) the install even modified (replaced) the Apache executables - a feature I am not fond of.
With the extensions installed, a FrontPage user will be able to connect to your Apache site, upload and download their pages and make use of all the special features that FrontPage enables.
SSL is a security standard that Netscape introduced. It has two particular features:
You will use port 443
Traffic between you and the client browser will be (lightly) encrypted
There is tons of information about implementing SSL on Apache, as there are dozens of configuration features. Suffice it to say that there are SSL equivalents to most of the standard configuration settings (there is an <SSLVirtual_Host …> block for instance).
There are many tools and systems on the internet that are used to find out about servers, some are benign, some not so. If you really want a web server to sit on the internet and NEED to have some world wide authority determine that you are running an Apache server version 1.2.3.4 then leave alone, because that information is readily accessible. On the other hand you may not want to be so brazen. In recent months an Internet worm called Ramen has been doing the rounds, zapping web sites running RedHat distributions using a security hole in wu_ftp. Consider that Apache, while not directly responsible for this security issue, could be in some future attack, and you realise why paranoia is your friend. In the section on compilation, I mentioned that you could edit the httpd.h file, in here is a constant called SERVER_BASEVERSION that sets the servers' version number, and is the value passed back when a client queries the server. By changing this value you can at least confuse for a short time anyone trying to get detailed information about the configuration of your system. Additionally you can set the configuration file value ServerTokens to limit the information emanating from the server in its response headers.
You might also like to look in the configuration files to see if you have two small blocks enabled that allow server-status and server-info to be requested from the server. If these are not limited to the internal network by using "deny from all" and then "allow from 192.168.1." (or similar) then an outsider can review your server setup, especially with server-info enabled.
The Apache security model allows for two ways of establishing individual directory level security. Either you can edit the <Directory …> blocks in the httpd.conf file, or you can create .htaccess files in the directories you wish to secure. While there are operating system level means of achieving directory security, in Unix/Linux it seems to be frowned upon (in NT directory rights are generally the best way of implementing security).
.htacess files (or any other name you give them (remembering to change the entries in the configuration file
Apache web site
PHP web site
MySQL web site
Postgres web site
registry.apache.org
books
Apache information sources are important. There are few commercial outlets that provide support (RedHat and IBM might for a price), so most support is by web chat sites and newsgroups. Some parts are better supported than others, PHP has a very good web site for instance.
For module information the primary source seems to be the module registry at Apache, but I think as you look around you will find other sources.
There are a large number of books on Apache. Many are quite out of date. Be aware that version 1.3.6 and above are quite different in structure to what went before, and that version 2.0 and above will be different again. There have been annotated code listing published, use the real thing, the annotated listings are so out of date by publishing time they don't even make good toilet paper dispensers. Several books give large run downs on the configuration options (O'Reilly for one), these options may or may not be ordered as they appear in the configuration files, it’s a matter of taste as to whether this is a good thing or not. I prefer to see the parameters listed somewhere in a book in the order they are found in the configuration files. The IDG books also give a run down on some of the modules that can be added, and the parameters these add to the brew.
Lastly, and it should be your first stop, a default installation of Apache includes a htdocs/manual directory that includes a very nice run down on all the parameters that the standard system contains. It is not altogether clean from the pages how to do things, but it is a starting point from which to go.
Topics I have not covered include redirection and running clustered Apache. These are topics that commercial sites might use to build maintainable, scaleable Apache installations, but in the limited time available are beyond what I can tell you about.
Monitoring and log analysis are also subjects that have profound impact and are complex issues in their own rights. In a commercial situation your marketing team might want to know page hits and hit rates to determine if a particular campaign is working. Tools for this are a matter of personal taste, wander over to Google to find your personal poison.
Integration of other server types. We have not touched on how we go about integrating other types of servers (i.e., WAP servers). These may become important additions in coming years, currently they are "maturing" technologies, tread carefully.