Training, Open Source computer languages
PerlPHPPythonMySQLApache / TomcatTclRubyJavaC and C++LinuxCSS 
Search for:
Home Accessibility Courses Diary The Mouth Forum Resources Site Map About Us Contact
 
For 2023 (and 2024 ...) - we are now fully retired from IT training.
We have made many, many friends over 25 years of teaching about Python, Tcl, Perl, PHP, Lua, Java, C and C++ - and MySQL, Linux and Solaris/SunOS too. Our training notes are now very much out of date, but due to upward compatability most of our examples remain operational and even relevant ad you are welcome to make us if them "as seen" and at your own risk.

Lisa and I (Graham) now live in what was our training centre in Melksham - happy to meet with former delegates here - but do check ahead before coming round. We are far from inactive - rather, enjoying the times that we are retired but still healthy enough in mind and body to be active!

I am also active in many other area and still look after a lot of web sites - you can find an index ((here))
Stop Words

Posted by TedH (TedH), 1 June 2008
Hi all, I have a search facility that works and can cut out words less than 4 characters (so I don't get: if, it, the... etc). But if I enter "then" or "there", then I get an awful lot of results. Obviously I need to stop the search when any of the "stopwords" are input.

This is as far as I have got.
Code:
print "Content-type: text/html\n\n";
$keyword = "there then";  ## real word(s) from form input

## The stopword list
@stopwords = ('then','there','etc-etc');

## Read stopword list
foreach $stopword (@stopwords) {
 if ($keyword =~ m/\b$stopword\b/i) {&wordStop;}
 else {&wordOk;}
}

## The do somethings - Works but I get multiples?
sub wordStop {print "NotOkay<br>\n";}
sub wordOk {print "Okay<br>\n";}


The object is that when the code sees a "stop word", it stops and goes on to the "wordStop" sub. If it's okay, then the "Okay" sub is used. It picks up the words fine, even phrases like "there then", but it gives me a mulitple reading, like:
NotOkay
NotOkay
Okay

Or:
Okay
Okay
Okay

Depending on input.

What do I do?

Thanks for any help - Ted

Posted by KevinAD (KevinAD), 1 June 2008
on 06/01/08 at 20:27:32, TedH wrote:
Hi all, I have a search facility that works and can cut out words less than 4 characters (so I don't get: if, it, the... etc). But if I enter "then" or "there", then I get an awful lot of results. Obviously I need to stop the search when any of the "stopwords" are input.

This is as far as I have got.
Code:
print "Content-type: text/html\n\n";
$keyword = "there then";  ## real word(s) from form input

## The stopword list
@stopwords = ('then','there','etc-etc');

## Read stopword list
foreach $stopword (@stopwords) {
 if ($keyword =~ m/\b$stopword\b/i) {&wordStop;}
 else {&wordOk;}
}

## The do somethings - Works but I get multiples?
sub wordStop {print "NotOkay<br>\n";}
sub wordOk {print "Okay<br>\n";}


The object is that when the code sees a "stop word", it stops and goes on to the "wordStop" sub. If it's okay, then the "Okay" sub is used. It picks up the words fine, even phrases like "there then", but it gives me a mulitple reading, like:
NotOkay
NotOkay
Okay

Or:
Okay
Okay
Okay

Depending on input.

What do I do?

Thanks for any help - Ted



Your question is confusing but your code is not. You have three words in the array:

@stopwords = ('then','there','etc-etc');

so obviously you will get three evaluations/results returned: one for each element of the array. In this case two NotOkay (then and there) and one Okay (etc-etc).



Posted by TedH (TedH), 1 June 2008
Hi Kevin, I may be tackling this the wrong way (dunno).

I figured if I want the search to stop and display a message like say: "then" is a common word, please be more specific. I would take the keyword, scan thru a file with "stop words" in and if the keyword matches a stop word, the entire search is abandoned right there. Sort of like interrupting the process because I've given it limitations (in this case certain words).

So I used if/else - 'cuz I couldn't think of anything other to use. Make sense?

I've tried this with a file, plus the way I posted here and both yield the same result. I assume it must be how I'm handling the array. I can see what it's doing, just don't know how to stop it and may even be doing it the wrong way.

Posted by TedH (TedH), 2 June 2008
Well, I've got somehwere.

Code:
$keyword="then";

$thefile="words.txt";
# contains one word per line: then there this

open(TFL,"$thefile");
if (grep{/$keyword/} <TFL>){&aaa;}else{&bbb;}
close TFL;

sub aaa {print "End the search\n";}
sub bbb {print "Continue the search\n";}


Didn't realize I could use grep <dumb grin>.

This works when just one word is used.
If I type in say: "then I can see this there", it will pass the test.

Posted by KevinAD (KevinAD), 2 June 2008
grep is the wrong tool for this job because it must search the entire file. What you want is a "while" loop and the "last" keyword.


while (<FILE>) {
  if (/$word/) {
         print "Naughty boy!\n";
         last;
  }
}


that should be much more efficient unless the word is at or near the end of the file.

Posted by TedH (TedH), 2 June 2008
Thanks Kevin, I'll give that a try.

There's only going to be about 30 words max in the file (begins at 'about' and ends at 'with', so I may not see much difference timewise.

I see what you mean by the loop and last being more efficient. I would assume nothing is cached as the loop is just looping for a word until it finds it and once found it become the 'last' (?). Whereas the grep must cache stuff in order to find it (yeah?). Though both must go through the whole file if the word "with" is used.

Posted by KevinAD (KevinAD), 3 June 2008
It depends, since you are reading a file, grep will cause perl to read the entire file before searching. If you were searching an array/list that already was defined I don't think grep saves a copy of the array to search but searches the original list, but of course it all depends on how you code it.  For a small file it probably will not be noticeable. But if you are searching the file over and over, save the file into an array and search the array instead of open/closing the file over and over.

Posted by TedH (TedH), 3 June 2008
Kevin, I've implemented the loop. There is an almost imperceptible difference, barely noticeable (but I noticed it after a few tries) - tad faster.

I'll give you credit for that in the blog script. Don' t know if PM works on this forum, but I'd like to put a link to you on the download page.

Thanks - Ted

Posted by KevinAD (KevinAD), 3 June 2008
on 06/03/08 at 11:19:37, TedH wrote:
Kevin, I've implemented the loop. There is an almost imperceptible difference, barely noticeable (but I noticed it after a few tries) - tad faster.

I'll give you credit for that in the blog script. Don' t know if PM works on this forum, but I'd like to put a link to you on the download page.

Thanks - Ted

Ted,

That is considerate of you, but it is not necesssay.

Thanks,
Kevin



This page is a thread posted to the opentalk forum at www.opentalk.org.uk and archived here for reference. To jump to the archive index please follow this link.

You can Add a comment or ranking to this page

© WELL HOUSE CONSULTANTS LTD., 2024: Well House Manor • 48 Spa Road • Melksham, Wiltshire • United Kingdom • SN12 7NY
PH: 01144 1225 708225 • FAX: 01144 1225 793803 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho