TCL Hanging

Posted by Anuj (Anuj), 17 May 2006

Since last 2 weeks I'm facing a problem that I unable to resolve. In my case TCL is hanging.
We are developing a automation tool which can do all kind of teting, like functional, configuratoin, regression, etc. We are using TCL 8.3 to develop it. Platform is Windows. Till now we've already written more than 600 scripts in it which were running pefectly fine till 2 weeks back. 2 weeks back suddenly our tool started getting hang. After debugging I realized that's it's TCL which is hanging. As such I hadn't done anything extra that it suddenly started hanging. After going over net I come to know that sometimes because of Cygwin also TCL in windows hangs. So I unistalled Cygwin from the machine. For 2 days it seemed probelm got resolved but it actually didn't. I even formatted the machine and did fresh installation of all the required software but this also didn't help.
We are using this tool for our production testing and it got stopped at such a critical point.
Can anybody please help me out of this.
Right now I don't know how to get past this problem. Please somebody help me out.

One more information that might be of use is that in our tool we've done socket programming because we are taking serial and telnet connections to various devices. For this we've used native TCL and not expect.

Posted by admin (Graham Ellis), 17 May 2006
Can you reproduce the problem in a short(er) program that you can post here?  The problem is most likely to be a bug in your software; second most likely is a timing issue / feature of the OS and Tcl which has been overlooked in the s/w design and cn be stepped around.

If you don't have a shorter program that hangs, try cutting down your application and seeing how it changes the problem - that will lead you to a clue as to exactly what it is, and from there how to fix it.

These problems are never easy, by the way. I remember taking some six *months* to fix a problem with a program that crashed about every 60 minutes of run time.   A single run also took about that long to go through, so sometimes it worked, other times it had to be resubmitted.  And this was on a big mainframe running one job at a time!

Posted by Anuj (Anuj), 17 May 2006
Thanks Graham for such a prompt response.
I haven't yet tried to cut short the framework but may be I can try that. Though this seems to be unlikely solution because though we've more than 600 scripts but only one at a time they got sourced. Now the tcl is hanging even in scripts in which it was earlier passing. Scripts are keep on getting added in one of the directory of the framework but because they are sourced one at a time so I believe this shouldn't be the problem. But definitely I'll try this also.
Today I've installed latest version of tcl 8.4.13, just in case this may also solve the problem.

Posted by admin (Graham Ellis), 17 May 2006
If they WERE working and aren't now, what has changed?   I'm thinking timing issues if the failures are unpredicatble - perhaps a whole load more tasks, services, spyware, etc running on the machine that just pushes things over the edge sometimes?

