Tomcat 4.1 - java CPU% at 99.9%
Posted by ehuber (ehuber), 11 November 2003I am running an in-house developed loan servicing system that uses servlet/jsp framework with database pools attaching to MySQL and MS SQL Server 2000 database servers (with MySQL being the primary database). The server that tomcat is running on is running redhat 9 with 1.5gig of memory.
Under normal usage, with 90 users hitting it, the CPU utilization hovers around 2-3% for the java process. However at random times during the day (while users are on the system), something occurs that cause the CPU utilization to jump to 99.9% and it doesn't come down again until tomcat is restarted. Initially, the server can run this way for a few hours before a performance hit is noticed. And as other processes running on the box need resources, the java process will drop into the 80%-90% range, but as soon as everything else goes idle, the java process will jump back up to 99.9%.
Now, I know that I might have a loop somewhere in code. That is the first thing that comes to my mind, but I have researched this thing for a couple of months with no luck. I've logged the CPU usage and cross-referenced times to my catalina.out file and I'm not getting any thrown or caught errors that correspond to CPU jump times.
With earlier versions of tomcat, the java processes all used to be separate items (doing a "ps -ef | grep java"). Now there is only one java process with version 4.1. Is there a utility or something out there that would allow me to see inside that one process and try and figure out what is using all the resources? Or does anyone have any better ideas on how I can proceed to figure this out?
Posted by admin (Graham Ellis), 11 November 2003Hmmmm ... some thought required Briefest of answer to start you off as I've only got a couple of minutes at the keyboard ...
An idea. Have you tried creating your own log file and logging entries to and exit from the methods you provide in your servlet(s) - making sure that they balance; that way you might find any rogue loops, or conversely come to the thought that the problem is something within Tomcat.
Posted by vsiedt (vsiedt), 31 May 2004Hello,
could you solve this problem? Cause we've just run into exactly the same problem and have no idea, what's going on.
Posted by admin (Graham Ellis), 31 May 2004Hi, Volker - welcome. Our original poster never followed up, so I don't know if he / she tried out the logging I suggested or not, and if he / she did whether or not they worked!
Have you tried those ideas out? Any results, feedback? Are you using Tomcat 4.1 too, or have you moved on to 5 yet?
Posted by ehuber (ehuber), 1 June 2004We solved the problem about 6 months ago. The issue was linked to a coding problem (per say) and MySQL. The 99.9% jump had occured when we supplied an invalid "LIMIT" command to the end of a SELECT statement. In our case, this happened doing a paging effort on a result set. If the total number of rows in the result set was exactly divisible by the number of rows per page, the last page (because of the error in my code) incorrectly displayed a next page link. If it was clicked, then the server utilization jumped. Which in my mind should not of happened. I would have expected MySQL to throw an error, but it doesn't.
Anyway, the only way we were able to find this problem was to upgrade to Tomcat 5 and use the new "Manager" features. When the server was pegged, the manager showed us exactly what session and URL was taking all the resources. From there, we were able to narrow in on the problem and solve it.
Posted by admin (Graham Ellis), 1 June 2004You must have "notify" set or be lurking here a lot
Many thanks for the follow up ... glad you got it resolved even if it was a pretty dramatic upgrade that did it for you. Volker - hope that helps you; it certainly should help you move forward. Please do follow up if we can be of any further help or if you make any discoveries that are worth adding.
Posted by vsiedt (vsiedt), 4 June 2004All right, we were able to fix this today.
The problem was related to some regular expressions. We get the content of a foreign server by HTTPConnection and integrate it into our own site. We need some regular expressions to modify the HTML before integrating it into our site. Since the foreign content had changed (actually, exactly five blank lines were added) one of the regular expressions went nuts. It needed ages to process the content and allocated several hundred Megabytes of memory within seconds! Therefore the garbage collector came into place and tried to free some memory, fighting against the RE still processing the content. Finally the server went out of memory. Altering the RE to a slightly more intelligent and robust version fixed the problem.
Maybe it's of interest how we tracked down the problem: We considered migrating to Tomcat 5 as advised but found another solution:
- We added a filter to all webapps on the server, logging each incoming request
- We put on verbose mode for the garbage collector. As soon as the Java process utilized 99 % CPU time, garbage collections occurred much more frequently than before. Therefore we could identify a small set of requests (say: candidates) of which one must have caused the problem.
- When running on 99% CPU time we sent a SIGQUIT to the java process and analyzed the thread dump. By comparing all "runnable" threads to the candidate requests we were able to find the evil one.
Thanx a lot for your help
PH: 01144 1225 708225 • FAX: 01144 1225 899360 • EMAIL: firstname.lastname@example.org • WEB: http://www.wellho.net • SKYPE: wellho