Posted by abdi (abdi), 3 June 2003
hi,
I was thinking to translate a tcl proceure into C++ or python, my problem is that I am not familiar with tcl, so looking for help to be explained each step.
the program gets a list of probabilites, calls a function called generateProbability() whcih returns probability of a word, it combines the probability of 15 from that list and returns those with probablity above 90 %
here is program:
Code:# isSpam -- # Guess whether a given message is spam. This is done by finding # the 16 most interesting words in the messages (i.e. those whose # probabilities deviate most strongly from being neutral) and # combining those words' probabilities to give an overall # probability that a particular message is spam. This is then # converted into a boolean value with a trivial threshold # function, so that messages are only found to be spam when the # code is better than 90% sure of it.
proc isSpam {message} { global WordRE reasons while {[regexp -indices -start $i $WordRE $message match]} { foreach {j i} $match {} set t([string range $string $j $i]) {} } foreach word [array names t] { set p [generateProbability $word] lappend magic [list [expr {abs($p-0.5)}] $p $word] } foreach l [lrange [lsort -decreasing -real -index 0 $magic] 0 15] { append reasons "[lindex $l 2] (score=[lindex $l 1]) " lappend interesting [lindex $l 1] } set score [combine $interesting] append reasons "=> Overall Score $score" return [expr {$score > 0.9}] }
|
|
Posted by admin (Graham Ellis), 3 June 2003
I'll add a few comments ... hopefully that will point you in the right direction. Remember - in Tcl anything that's written in square brackets is a command, and that command is performed and the results substituted into the line. Thus
set score [combine $interesting]
in Tcl would be
score = combine(interesting)
or something similar in other languages
Code:proc isSpam {message} {
# the variables wordRE and reasons are shared with the main program
global WordRE reasons
# Look for Regular Expression matches in the message
while {[regexp -indices -start $i $WordRE $message match]} {
# And loop through each of the matches, counting words in an array # called t (equivalent of a dictionary in python)
foreach {j i} $match {} set t([string range $string $j $i]) {} }
# Take each of the words found and see how common it is using # a command (proc) called generateProbability.
foreach word [array names t] { set p [generateProbability $word]
# Make up a list of lists of words and their probabilities
lappend magic [list [expr {abs($p-0.5)}] $p $word] }
# Sort the list, and loop through the 16 most improbable, putting their # values into a new list called interesting
foreach l [lrange [lsort -decreasing -real -index 0 $magic] 0 15] { append reasons "[lindex $l 2] (score=[lindex $l 1]) " lappend interesting [lindex $l 1] }
# combine the interesting values (I think combine is another proc of yours?)
set score [combine $interesting] append reasons "=> Overall Score $score" return [expr {$score > 0.9}] } |
|
P.S. In Tcl, you use a $ if you're just referencing the existing value of a variable, but you don't use a dollar if you're setting a value.
This page is a thread posted to the opentalk forum
at
www.opentalk.org.uk and
archived here for reference. To jump to the archive index please
follow
this link.