Running Expect from Windows to Unix - The Tcl programming language

Posted by gwreiddyn (gwreiddyn), 12 June 2006

There are a few articles on opentalk that hint at a similar, but not exact, problem to the one I explain below, but the indicated solutions do not work for me.

I am running Expect 5.21 from a Windows XP box to a Sun Solaris 5.8 server. The Expect script spawns a ssh session within cmd.exe and executes a Solaris command (currently just testing with 'ls -A1'). Unfortunately, the output is garbled with unicode tags. Setting the TERM environment variable to dumb or vt100 does not clean up the output. For lack of a better idea, I am attempting to use regsub to clean up $expect_out(buffer), however I can't seem to identify and remove a particular character.

Here's an example of the garbled output:
snip...
←[0;25HPatches←[0;25H
←[0;26HPkgs←[0;26H
←[0;27HProjects←[0;27H
←[0;28HRPM←[0;28H
←[0;29HRPM.20060313.tar←[0;29H
←[0;30HRPM.20060419.tar←[0;30H
snip...

I understand the "←[0;38H" are unicode characterss designating column and row position, but this understanding hasn't helped me come up with a way to exclude them from the output.

Setting "exp_internal 1" reveals the following information:

snip...
expect: does "\x1b[11;4Hset term=dumb\x1b[0;4H\x1b[0;5Hnocfs12:~> ls -A1\x1b[0;5
snip...

In my Expect script I have the following hodge-podge of regsub commands in an attempt to remove the unicode garbage:
snip...
set data $expect_out(buffer)
regsub -all -- {\x1b\[[0-9];[0-9]*H} $data "" data
regsub -all -- {\u001b} $data "" data
regsub -all -- {\[[0-9]*;[0-9]*H} $data "" data
snip...

None of these can remove the "←" character. It seems the Windows cmd shell is producing these characters, but no amount of messing with the settings (even using "cmd /U

ff" and running the expect script from the new shell does not seem to help).

Anyone have any ideas how to either remove the garbage or have another way of cleaning it up?

P.s. I noticed that the unicode character I am seeing in the output is showing up as an ANSI character in this post. The unicode character I am seeing is a left-arrow. This website is showing it as "←["

Posted by admin (Graham Ellis), 13 June 2006

What shell is your SSH running into? are you sure you have the environment variable TERM set there? I think you may need to export it in certain shells, setenv it in certain others .... worth an experiment with in any case, as it doesn't seem to be working and it would be much easier to fix that than to fix the cludge. I think you're getting hung up on problems in your work-around when a work around should not be necessary.

As regards the crud in the expect_out buffer, I would use a utility to examine exactly what byte sequence I had (since unicode is not my strongest point to put it mildly!). Simple save the buffer to a file, the look at it through a byte viewer / edit. My own Unix / Linux background would have me upload the saved buffer back up to the server or another Unix / Linux box and run od -x on it ... to see exactly the bytes I needed to clean up. But as I say, this is very much a second option and you would be far better off if you can fix the real issue of the terminal type.

Posted by gwreiddyn (gwreiddyn), 13 June 2006

Thanks for the quick response.

I have set the term variable in my rc scripts (I'm using tcsh which sources .tcshrc upon login) on the Solaris box and am 100% certain that the TERM environment variable is getting set and exported upon login. I have even confirmed it via the same Expect script, but for some unknown reason this does not resolve the issue. Hence the kludge solution.

The following regsub expression has so far resolved my question including removing the unicode character, but it isn't pretty:
snip...
set data $expect_out(buffer)
regsub -all -- {.\[[0-9]*;[0-9]*H} $data "!!" data
regsub -all -- "!!!!" $data "\n" data
regsub -all -- "!!" $data "" data
snip...

Fortunately, I haven't come across any command output or Solaris file that just happens to contain the same RE, so this could work, but that could just be luck. I just hate kludgey solutions.

Is there a way to express unicode or ANSI characters in the regsub or regexp expression?

Posted by admin (Graham Ellis), 14 June 2006

See http://wiki.tcl.tk/515 for more on Unicode in Tcl ... you've got a glorious mixture of Unicode and ANSI terminal sequences, mind.

Like you, I don't like the kludge ... except that if you emulate the terminal in how you reverse engineer the sequences, it should be as robust as the initial program your running. After all, that doens't corrupt the screen, does it ?

This page is a thread posted to the opentalk forum at www.opentalk.org.uk and archived here for reference. To jump to the archive index please follow this link.