Training, Open Source computer languages
PerlPHPPythonMySQLApache / TomcatTclRubyJavaC and C++LinuxCSS 
Search for:
Home Accessibility Courses Diary The Mouth Forum Resources Site Map About Us Contact
Using regexp to parse data

Posted by tschnell (tschnell), 2 February 2005
I am trying to use regexp when parsing a data file.  

the data file i am parsing looks like this:

.............................................

"IPADDR=11.24.255.203,IPMASK=255.255.255.192,DEFRTR=11.24.255.193,NTP=11.24.255.193,NAME=\"DDS203\","IIOPPORT=57790"

...................................................

I am using a regexp command to put the values from above into variables:

regexp {"IPADDR=(.*),IPMASK=(.*),DEFRTR=(.*),NTP=(.*),NAME=\\"(.*)\\",IIOPPORT=(.*),"} $aline ignore4 ip_add ip_mask def_rtr ntp host iiop_port
             
puts $out "$host,$ip_add,$ip_mask,$def_rtr"


...........................................................

I am not sure what to do with the \"     \" surrounding the host NAME.  I do not want to capture these (although if that is the only way I can do it, I will).  I cannot get the regexp to find the value for host.  I get the following error:

can't read "host": no such variable
   while executing
"puts $out "$host,$ip_add,$ip_mask,$def_rtr""


Thanks so much for your help.


Posted by admin (Graham Ellis), 2 February 2005
If a regular expressions fails to match then it won't set ANY of the output variables - you can't have it half matching.  The reason it reports the error on $host is that this is the first of the variables that you try to access after doing the (failed, it would seem) match.

I have NOT studied your match in detail as you can use Tcl to do that for yourself.  Duplicate the regex line, comment out the original by putting a # in front of it, then reduce the copy to ONLY match the IPADDR ... and change your puts statement to output only $ip_add.    Then add parts of the regular expression back in until you find the element that's giving problems.

I suspect the \\ is fine, by the way - I suspect that part of your problem is the training comma on the end of the regular expression which isn't in your sample data line.

I also suspect that I'm the only one that was reminded of the film "Four weddings and a Funeral" when I wrote several sentences starting "I suspect" in that last paragraph!

Posted by tschnell (tschnell), 2 February 2005
Thanks Graham.  I actual have this working when the name field does not haev the \" including in the output, which is the case on previous version of the equipment that I am running the script against.  for instance the following works fine:

regexp {"IPADDR=(.*),IPMASK=(.*),DEFRTR=(.*),NAME=(.*),SWVER=(.*),LOAD=(.*)"} $aline ignore4 ip_add ip_mask def_rtr host swver load
               puts $out "$host,$ip_add,$ip_mask,$def_rtr,$swver,$load"


when run against my data file:

"IPADDR=11.24.255.202,IPMASK=255.255.255.192,DEFRTR=11.24.255.193,NAME=DSSS202,SWVER=8.12.01,LOAD=08.51-0044H-2"

The hanging comma was just me leaving a comma, there were other variables in the original output and i was trying to simplify.  my bad.  

thanks again.
tricia

Posted by admin (Graham Ellis), 2 February 2005
Advise is still "simplify to tie it down". and within curly braces the double baskslash should be correct.

Looking back at your original post, I note that your IIOPORT and value in the data are within double quotes, but the regex doesn't allow for them; perhaps a typo, or perhaps the source of your problem?    Whichever, simplifying will help you tie it down - it's rather like "looking for a needle in a haysatck" for me!

Posted by tschnell (tschnell), 2 February 2005
Okay, I got it to work, but I am not real sure what I did.  the only problem is i am using in my regexp:

regexp

{IPADDR=(.*),IPMASK=(.*),DEFRTR=(.*),IIOPPORT=(.*),NTP=(.*),NAME=(.*)} $aline ignore4 ip_add ip_mask def_rtr iiop_port ntp host swver load

as i mentioned earlier, the output in the data file has the name entry as:
"...............,NAME=\"DSS203\",..........................."

so when I puts out:
puts $out "$host,$ip_add,$ip_mask,$def_rtr,$swver,$load"


I get:
\"DSS203\",..........

The host name is really DSS203 without the \" surrounding it.  is there a way I can include those with the regexp so they are not included when it stores the value into my host variable.
thanks again.

Posted by admin (Graham Ellis), 3 February 2005
Does this worked / tested example help? I can't see that it's much different to your original ...

Code:
set mydata {NAME=\"www.sheepbingo.co.uk\"}
puts $mydata

regexp {NAME=\\"(.*)\\"} $mydata all part
puts $part


Quote:
[trainee@buttercup trainee]$ tcl nna
NAME=\"www.sheepbingo.co.uk\"
www.sheepbingo.co.uk
[trainee@buttercup trainee]$


Posted by tschnell (tschnell), 3 February 2005
Hi Graham,
You are correct, that does work, and it is what i started with.  There must have been some kind of typo in the original run that was causing me to  chase my tail.  thanks for your help.




This page is a thread posted to the opentalk forum at www.opentalk.org.uk and archived here for reference. To jump to the archive index please follow this link.

You can Add a comment or ranking to this page

© WELL HOUSE CONSULTANTS LTD., 2014: Well House Manor • 48 Spa Road • Melksham, Wiltshire • United Kingdom • SN12 7NY
PH: 01144 1225 708225 • FAX: 01144 1225 899360 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho