Training, Open Source computer languages
PerlPHPPythonMySQLApache / TomcatTclRubyJavaC and C++LinuxCSS 
Search for:
Home Accessibility Courses Diary The Mouth Forum Resources Site Map About Us Contact
regex matching with regexp

Posted by redbanditos1999 (redbanditos1999), 16 October 2002
Hi people !!

I need help for following problem:

how can I match \%\w in strings like

hallo %hallo = 12U;
12 %12 = 0;

but expect of strings like

printf("Today is the %d.%d.1998\n", pstruToday->sDay, pstruToday->sMonth);
printf("%d", xxx_ts_ExternCharArrayWrongType [2]);
printf ("%d >> 2 = %d\n", sSignedShift, sResult);

I need the  \%\w to be relevant only outside of printf(...) sequences.


Many thanks for quick help !!!

redbanditos

Posted by admin (Graham Ellis), 16 October 2002
OK - you're looking for a backslash followed by any word character using the Tcl language in what looks like a source file of C++, but wanting to omit any matches that are within the format string of a printf?

Think I would do it in three simple steps not one complicated one.
a) Find all %-wordchar sequences within printf statments and temporarily replace them with something else
b) Do whatever you want with the remaining %-wordchar sequences (you don't tell us what you want to do in your question once you've found them)
c) Replace the sequences you found at the beginning within the printf by thge original sequence.



Posted by redbanditos1999 (redbanditos1999), 17 October 2002
Thx for the answer Graham   !

You're right - I dont ever tell you what I want to do with  .
Im working on a syntax checker for C (Logiscope Tau Studio), where you can define your own rules for checking source files. The rules must be written in TCL mixed with special methods based on C-Language Data Model.
In the rule I write for now I need to match all binary operators, which dont have blanks after and before. So I have now 2 regexp calls which define regular expressions to find my operators.  And now I only need the regular expression which allows me to find "%blahblah" or "%1234" in my source files expect of using "%" in printf (like "%o" "%d" "%c" "%s" "%x").
Ive tryed diverse expressions but I couldnt take advantage .

I hope U're better informed ow

redbanditos


Posted by admin (Graham Ellis), 18 October 2002
Let me express a worry before I get too far into an answer. I'm always concerned at using a regular expression to analyse a programming language; programming languages are structured to be analysed by a compiler or interpretter that tokenises the source - splitting it an each word boundary, and handling it token by token.   Although regular expressions are a fabulous capability, in the instance of analysing a language (such as your request) they have the capability of getting things a bit wrong.  That may not be a concern, but you're going to have to be very caerful if you want to correctly analyse code like:
Code:
     printf  /* What a "fun piece of code this is */
     // with comments in the middle of the printf.
     /* etc */    ("\"There are %dunits in stock\" %s said",
     %numbah, person);

I think that's a valid piece of code - not the hard to analyse embedded comments and protected double quotes,  the statement spread over a number of lines, and the %dunits which is really a printf %d followed by the literal next untis.

OK - end of my caution.  Perhaps your C++ isn't this nasty and a quick and rough tool using regular expressions will suffice ...

Assume
a) All printfs are at the start of a line
b) There are no emberred "s in the " string in the printf
c) No embedded comments at the start of the printf
d) The print format is a plain text string - you're not using an expression
     to make it up, or calling another method?

Umm ....       I'm starting to worry myself now. Although each of these is an unliekly scenario, put toghether chances are that one or other will crop up from time to time ...   Redbanditos ... will a rough toll suffice (in which case I'll spend a few minutes writing a regular expression to demonstarte how I would do this job), or have I persuabded you that you'll need to do a rethink?

Posted by redbanditos1999 (redbanditos1999), 25 October 2002
Hi Graham !!

You're so right with your worries, but the C-code that I have to check isnt so nasty .
We develop software for aeroplane devices and the code is protected by certain rules before cases like you've described.
So I would be very grateful to you, If you could demonstrate your approach to me  

Greeeetz

Redbanditos

Posted by admin (Graham Ellis), 26 October 2002
Here's a file to process:

Code:
// This is s a test file looking to change %xxx sequences
// into something else, except when they happen to appear
// in printf format strings.
//
// Method -
// a) Find all the sequences in printf and replace them with something else
// b) make the changes wanted to all remaining %xxx sequences
// c) Change the printf examples back
//
// Be very careful about using Tcl code to analyse code as there are just
// too many special cases it can go wrong.  Example - these comments!!!
//

float function demo1(int abc, float %def, int ghi) {
       // It's a long time since I did C / C++ ....
       float temp
       if (abc > ghi) { temp = 0.0;
               printf ("Problem with %03d record",abc);        
               printf ("Value is %.2f",%def);  
       } else {
               temp = %def;
       }
       return temp;
       }


Here's a program that does the conversions

Code:
#!/usr/bin/tcl

# Look for %xxx sequences in tcl  but exclude those within printf

set program [read stdin]
# puts $program

# Replace % in printf formats with ~

while {[regexp -indices {printf[^"\n]*"[^"\n]*(%[^a-z]*[a-z])} $program match submatch] > 0} {
       set start [string range $program 0 [expr [lindex $submatch 0] - 1]]
       set end [string range $program [expr [lindex $submatch 0] + 1] end ]
       set program $start
       append program ~~
       append program $end
       }

# Replace remaining %xxx sequences with doit(xxx)

while {[regexp -indices {%([a-zA-Z0-9]+)} $program match submatch] > 0} {
       set start [string range $program 0 [expr [lindex $match 0] - 1]]
       set word [string range $program [lindex $submatch 0] [lindex $submatch 1]]
       set end [string range $program [expr [lindex $submatch 1] + 1] end ]
       set program $start
       append program "doit($word)"
       append program $end
       }

# Turn ~~ sequences back to %

while {[regexp -indices ~~ $program match ] > 0} {
       set start [string range $program 0 [expr [lindex $match 0] - 1]]
       set end [string range $program [expr [lindex $match 1] + 1] end]
       set program $start
       append program %
       append program $end
       }

puts $program


and here's the output

Code:
// This is s a test file looking to change doit(xxx) sequences
// into something else, except when they happen to appear
// in printf format strings.
//
// Method -
// a) Find all the sequences in printf and replace them with something else
// b) make the changes wanted to all remaining doit(xxx) sequences
// c) Change the printf examples back
//
// Be very careful about using Tcl code to analyse code as there are just
// too many special cases it can go wrong.  Example - these comments!!!
//

float function demo1(int abc, float doit(def), int ghi) {
       // It's a long time since I did C / C++ ....
       float temp
       if (abc > ghi) { temp = 0.0;
               printf ("Problem with %03d record",abc);        
               printf ("Value is %.2f",doit(def));    
       } else {
               temp = doit(def);
       }
       return temp;
       }


Notes ....

a) See cautions on previous postings.   Note that my sample code makes changes within comments which you might not want!

b) My code is screaming out for a proc to be written to do this sort of global substition - I'll leave that to you once you've fine tuned the code.

c) I wondered about using -all on regsub but there's a number of problems with that, staring with the fact that my matches can overlap if there is more than one format directive in any printf.



This page is a thread posted to the opentalk forum at www.opentalk.org.uk and archived here for reference. To jump to the archive index please follow this link.

You can Add a comment or ranking to this page

© WELL HOUSE CONSULTANTS LTD., 2014: Well House Manor • 48 Spa Road • Melksham, Wiltshire • United Kingdom • SN12 7NY
PH: 01144 1225 708225 • FAX: 01144 1225 899360 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho