Clustering on Tomcat Subject: Clustering, using Apache http server (version 2.2.14 in my example) with mod_proxy_balancer as the front load splitter and Apache Tomcat 6.0.20 as the replicated application engine. [[Tip should also work for other recent 2.2.x and 6.0.x versions]] Background This is a follow on article from Load balancing with sticky sessions (httpd / Tomcat), where I looked at sharing out the application work between a number instances of Tomcat from an Apache http server (httpd) that did the bookkeeping. In a nutshell, the Apache http server sent new arrivals to a 'random' Tomcat, and then used sticky sessions so that - when a visitor came back for their subsequent visit in the same series of accesses - they would always talk to the same Tomcat and could continue their conversation with the server having full knowledge of the position to date. The balancer alone is a good solution as far as it goes but: • What happens if the Tomcat that has been stuck to goes out of service? • What happens if you have such a lot of traffic that you need to replicate your httpd front end? • What happens if your httpd fails? • What is you don't actually want to use sessions, but still need what appears to be a single Tomcat? One possible option to addressing some of these is to use the clustering capability of Tomcat, which I'll describe below. But you should first consider if you really need the extra step: (a) can I accept that a session will be lost on the rare occasions that a Tomcat goes offline? (b) is writing to a backend database going to preserve sufficient information anyway? and if the answer to either is "yes", you probably do NOT need to cluster. How does clustering work? You run your web application on a series of identical (or rather "near identical" - the IP address will differ!) servers. With clustering turned on, each of the servers in the cluster is broadcasting (via multicast) any changes made in sessions, cookies, etc to any other listening cluster members on that same multicast address. So that when a visitor comes back for his / her next access, all the machines know what's been going on and can knowledgeably handle the request, even if the original machine isn't available. You can turn clustering on in Apache Tomcat 6.0.20 simply by uncommenting the line in the default server.xml file that relates to it: <Cluster className = "org.apache.catalina.ha.tcp.SimpleTcpCluster"/> and restating your Tomcat. Older versions of Tomcat (such as 5.5) had a long configuration section listing the ports, replication time, IP addresses to use, trigger files all of which are important but none of which actually needs to be changed from default in the current release that's the target of this article. Once you have turned clustering on (yes, it's now that simple), your machines will be communicating ... it's rather like starting a rumor in an office - before you know it, EVERYONE who's around has heard the rumor. Clustering with the balancer If you have already implemented balancing with sticky sessions (as covered in the preceeding article), turning on clustering will cause the data to be shared around. Most of the time the data passed around will not be used - it will ONLY form a backup of the session, to be used if the balancer is unable to reach the sticky machine because it has done down or been taken out of service. With sticky sessions activated, even a second front-end Apache http server won't cause a switch from one Tomcat to another unless a fail-over occurs, as the jvmroute is a part of the cookie so either (any) of the httpd front ends will correctly forward to the original Tomcat. And if you have an intelligent hardware load balancer, that too will be able to forward consistently and the the clustering will remain merely as a backup. If you disable sticky sessions on your balancer, the metrics will change. Forwarding will now be at shared to each of the Tomcats in the balanced group / cluster group (take care that all members of the balance group are included in the cluster!) and so the visitor will get to a differnt back end box each time. But that's now perfectly fine, as they're sharing the data between them so will all know about the originator. Testing if your cluster is working Ironically, clustering and balancing is designed to be transparent, so how do you test whether it's working? My first simple 'trick' is to change the background colour of the pages returned from each cluster member so that "if it's orange it must be Holt" and "if it's blue it must be Chippenham" (our servers are names after local towns and villages!). Going a little further, you can edit your servlet / JSP to return the name of the current host. In Java, the following line: String myname = InetAddress.getLocalHost().getHostName(); will return you the local name of your computer, so that you can then echo the name. On last Tuesday's course, I took our sample "Barman" script that remembers how many drinks you've had in a session (visit counter!) and extended it into a "Pub Watch" script, where each of the barman communicates with his colleagues in neighboring pubs to keep track of who's out on the town, and how much they have had to drink in each establishment. If you click on the links in the previous paragraph, you can download the source code for "Barman" and "PubWatch" and try the code out for yourself. Using the balancer manage that I introduced at the end of yesterday's article, you can open and close individual pubs and see how their customers go elsewhere for their next drink, and you can turn sticky sessions off in the balancer and see how faithful customers will then hit the road and go to a different pub each time for their next drink. Some notes on clustering 1. The machines in the cluster communicate through multicast, so must be on the same subnet. 2. It's a good idea for the subnet you use to have plenty of capacity if your environment is busy, and for it to be firmly behind a strong firewall from your own company's general user traffic, let alone the Internet 3. If you have multiple Tomcat clusters on the same subnet, you'll need to configure one of the clusters away from the default settings - otherwise they'll end up as being one big cluster (you'll find the word 'tribe' creaping in here!) At present, we mention clustering on our public deploying apache httpd and Tomcat course. Only a small proportion of our delegate want to go 'that far', and for newcomers who hadn't done any web server work when they first came along a couple of days earlier, it would be just too much for the one session. An extra day on the end of a Tomcat course, coverage in a private course, or a special session set up for the purpose ... all are possible to help you learn how clustering and balancing work. We'll have a network of computers set aside at our training centre for the purpose of setting up a test case, experimenting with configurations, seeing what happens when machines are switched on and off. Something you wouldn't dare so with your own production environment, and might be reluctant to do even on your development of test networks (that's even assuming that you do HAVE multiple machines at the development or test level). (this article written on 2009-10-30) |
This is http://www.wellho.net/demo/newsletter.php Well House Manor • 48 Spa Road • Melksham, Wiltshire • United Kingdom • SN12 7NY Phone: +44 (0) 1225 708 225 • EMAIL: info@wellho.net • WEB: http://www.wellho.net |