APACHE HTTPD TO APACHE TOMCAT CONNECTION
Ever get confused between Apache, Apache httpd and Apache Tomcat? Do you need just one of them, or both? Can they simply coexist or do you need to connect them together?
APACHE, APACHE HTTPD AND APACHE TOMCAT
The Apache Software Foundation's original 'product' was their web server; it's a fabulous piece of Open Source software and it dominates the web server world with nearly three quarters of domains hosted on it. It was originally known simply as the "Apache Web Server".
One of the tough issues with an Open Source project is managing changes that come in from a disparate range of developers and need to be rolled back into the next product release, and the Apache team had a great mechanism for this - so much so that other projects wanted to come under their wing too. And so the Apache organisation became the "Apache Software Foundation" with their mechanisms in use for both the web server product and for other bits of software too.
When Java grew and server-side Java programs were starting to be developed, the virtual machine environment and servlet specification didn't fit well with the Apache server; a purpose-designed server was needed to implement the proposed standard. And what better mechanism to use for the administration of the new project that the tried-and-tested one provided by the Apache Software Foundation?
So ... the net result is that there are TWO totally different Apache Web Servers. There's Apache httpd, which is the general purpose server that's been in use for many years but has been transformed / developed during that time to meet current needs. And there's Apache Tomcat which is the server who's main purpose is to provide Java support - that was initially in the form of Servlets, but has been extended to support JSPs (JavaServer pages) too.
DO I NEED BOTH SERVERS THEN?
Maybe. Sorry for the vague answer.
If you're running Servlets or JSPs or derivatives such as Struts of Spring then, you need Apache Tomcat or something that does the equivalent job (and Apache httpd does NOT - it would simply slow down the other functionality far too much).
Apache Tomcat CAN serve regular web content - .html documents, .jpgs, style sheets, robots.txt files and all the other things for which you need a server which, frankly, is no more than a glorified supplier of files. And Tomcat DOES also provide support for server side programming in languages like Perl or Python through the Common Gateway Interface. But this isn't really what Tomcat's intended for and it's relatively slow and clunky at the task. All right if you've a few flat pages and the occasional Perl script, but not for a busy site.
And so it's more common that not to find both Apache httpd and Apache Tomcat in use powering the same web site.
HOW ARE THE TASKS DIVIDED?
First logic might suggest that you should put all your flat content (such as HTML pages) on the faster httpd server, and your Java onto Tomcat. That would give rise to a maintenance nightmare, as your files would be grouped by type rather than by their position in your online suite of applications, with files for even a simple task spread between the servers. So there's a better solution.
Tomcat introduces the concept of a "Web Application" or a webapp. That's a bundle of all the files needed for one particular task or area of the web site. So that includes the .jsp files and servlets, the controlling file web.xml which associates a particular class with a URL, and all the internally used classes, in addition to the files that simply need to be served such as the .css .html .gif and .jpg-s.
For example, our web site at Well House Consultants comprises some 6000 different URLs. About 10% of those (but well under 10% of our traffic) relate to our library of Open Source books, which we keep in a MySQL database and access dynamically as someone searches for a book at a particular level and on a particular subject - perhaps asking only for our 'hot favourites' as a form of recommendation. This is a classic example of a web application ... in Java all the files to be served by Tomcat, including the plain form that a new arrival might be presented with.
Carrying on with the example, it's rather more arguable whether or not the help pages / FAQ for the application - flat pieces of HTML with copious images, but to server side executable content, are best served by Tomcat or httpd ... sometimes the choice is obvious but at other times it's a close call.
CAN APACHE HTTPD AND APACHE TOMCAT COEXIST
Yes, they can - but there needs to be some sort of differentiator. In other words, if I contact computer number 192.168.200.66 on port number 80 (that's the default port for a web server), I really need to have one, and only one, process there to provide me with a service. It's as if I walk up to a hotel at No. 66 on a street called 192.168.200, and knock at door number 80. I can cope if the door is answered by one person, but if no-one's there I get no service. If two people answer the door and both answer when I ask a question, their answers get mixed up and I can't handle it.
So your first step if you need both httpd and Tomcat is to install them at the same IP address but at different ports - say httpd at the (default) port 80, and Tomcat at port 8080. Any requests to port 80 are then going to be handled by httpd, and requests to 8080 by Tomcat.
Your home page is probably a fairly standard greeting to arrivals at your site, so this "httpd first" approach is a good one. The URL
will default you to port 80, and a link within that page:
would divert the user off to a webapp served by Tomcat.
This isn't a bad solution for early testing and development, but it does mean that your web site and application management team have to be very clear-minded as to what is where, and ensure that they get all the :8080 and :80s correct in the code as they link in and out of the webapp. It also doesn't look very clever to the web site user, who see himself being sent off to a different service and whether it's done using a different port number or a different IP address (also possible), he may feel the site's not really a single site. Add the that the possibility of mistakes in your support department when they're emailing a URL to a client and get the wrong server, and you'll see that a single incoming connection (IP address, Port) for both servers is desirable.
Let's carry on with our hotel analogy. I walk up to 192.168.200.66 and knock on door number 80, and I'm answered by Apache httpd. I ask him a question and he gives me an answer. Good - that's what I wanted.
Did Apache httpd know the answer himself, or did he ask someone else? In real life, we would hear him yell across the room if he had to ask and we would certainly notice if he left us standing and went across to another room - say room 8080 - to ask the occupant of that room. But, never the less, we would be happy to have asked the question at a single point of contact and got a reply even if we had to hang on for a while. This is a much more desirable state of affairs than being referred on from one person to the next ...
That's how Apache httpd and Tomcat work together. Httpd takes the "lead" role and analyses all the questions ... and passes on those that it can't directly answer to Tomcat. Tomcat's response is then passed back to the original user who, at computer speed and in computer terms, might not ever realise that httpd has referred the question on. (This business of hiding within is often referred to by the term encapsulation).
HOW IS THE CONNECTION MADE?
There are numerous ways of connection httpd and Tomcat but this has been a fast-moving field over the last couple of years and at present there are just two fully supported routes. If you see references to jk2, that was deprecated in November 2004 and if you see references to warp, that was deprecated earlier (2003?). Jserv is obsolete too - last updates are dated 2000.
Current connection routes are jk and proxy.
You may well find that some books talk about jk being replaced by jk2 and/or warp and wonder who's got it the right way round. You'll even find some references to jk being deprecated itself Jk has been around for quite a while and jk2 and warp were later introductions that to some extent were still-born - they certainly didn't take over as was predicted at one time, the time that some of those books were written. ((The Warp architecture didn't work with Windows, nor with the older Apache releases that are very common, and Jk2 was very complex to set up and really didn't take off; it eventually was deprecated when the development team melted away)).
So .. Current connection routes are jk and proxy. Dateline - late 2005. Some of the features of jk2 are being saved / rolled into jk and I do NOT foresee any major change in the short term.
The Apache httpd server (with mod_proxy enabled) simply calls up another URL internally - that other URL will be on a different port number OR on even on a different host. The results of the enquiry are passed back to the original questioner.
In other words, your user's client program (browser) calls up the httpd server. httpd then itself acts as a browser and calls up a Tomcat server.
This is simple to set up and is covered elsewhere in our training notes. Care needs to be taken within the configuration that the in-going and returned URLs are rewritten correctly. In other words - if the browser asks for "x", httpd may need to ask tomcat for "y". When tomcat replies "here is y", httpd needs to translate that back to "here is x" before it's returned to the originator.
The most common cause of problems with a proxy connection is errors in this URL rewriting; either the URLs are not rewritten at all on the return path, or they're rewritten twice. Note that this error is NOT always obvious in a first test as browsers will only use the "here is x" message they get back in a limited range of circumstances!
In order to provide a more flexible and efficient communication route between Apache httpd and Apache Tomcat, you may want to break away from the http protocol ((note - I say MAY - it's equally likely you don't need this flexibility and efficiency))
The jk connector uses a a protocol known as ajp rather than http.
UPDATE - DECEMBER 2005
Up to and including Apache httpd version 2.0, you had to download the jk connector as a separate operation and to ensure you had exactly the correct sub-version to work with your sub-version of httpd (not so critical on the tomcat side as it was just a protocol generator)
The new Apache httpd 2.2 includes mod_jk so this compatibility / separate download issue should be resolved.
See also jk configuration - a simple example