| ||||||||||||
| ||||||||||||
Stop Words Posted by TedH (TedH), 1 June 2008 Hi all, I have a search facility that works and can cut out words less than 4 characters (so I don't get: if, it, the... etc). But if I enter "then" or "there", then I get an awful lot of results. Obviously I need to stop the search when any of the "stopwords" are input.This is as far as I have got. Code:
The object is that when the code sees a "stop word", it stops and goes on to the "wordStop" sub. If it's okay, then the "Okay" sub is used. It picks up the words fine, even phrases like "there then", but it gives me a mulitple reading, like: NotOkay NotOkay Okay Or: Okay Okay Okay Depending on input. What do I do? Thanks for any help - Ted Posted by KevinAD (KevinAD), 1 June 2008 on 06/01/08 at 20:27:32, TedH wrote:
Your question is confusing but your code is not. You have three words in the array: @stopwords = ('then','there','etc-etc'); so obviously you will get three evaluations/results returned: one for each element of the array. In this case two NotOkay (then and there) and one Okay (etc-etc). Posted by TedH (TedH), 1 June 2008 Hi Kevin, I may be tackling this the wrong way (dunno). I figured if I want the search to stop and display a message like say: "then" is a common word, please be more specific. I would take the keyword, scan thru a file with "stop words" in and if the keyword matches a stop word, the entire search is abandoned right there. Sort of like interrupting the process because I've given it limitations (in this case certain words). So I used if/else - 'cuz I couldn't think of anything other to use. Make sense? I've tried this with a file, plus the way I posted here and both yield the same result. I assume it must be how I'm handling the array. I can see what it's doing, just don't know how to stop it and may even be doing it the wrong way. Posted by TedH (TedH), 2 June 2008 Well, I've got somehwere.Code:
Didn't realize I could use grep <dumb grin>. This works when just one word is used. If I type in say: "then I can see this there", it will pass the test. Posted by KevinAD (KevinAD), 2 June 2008 grep is the wrong tool for this job because it must search the entire file. What you want is a "while" loop and the "last" keyword.while (<FILE>) { if (/$word/) { print "Naughty boy!\n"; last; } } that should be much more efficient unless the word is at or near the end of the file. Posted by TedH (TedH), 2 June 2008 Thanks Kevin, I'll give that a try.There's only going to be about 30 words max in the file (begins at 'about' and ends at 'with', so I may not see much difference timewise. I see what you mean by the loop and last being more efficient. I would assume nothing is cached as the loop is just looping for a word until it finds it and once found it become the 'last' (?). Whereas the grep must cache stuff in order to find it (yeah?). Though both must go through the whole file if the word "with" is used. Posted by KevinAD (KevinAD), 3 June 2008 It depends, since you are reading a file, grep will cause perl to read the entire file before searching. If you were searching an array/list that already was defined I don't think grep saves a copy of the array to search but searches the original list, but of course it all depends on how you code it. For a small file it probably will not be noticeable. But if you are searching the file over and over, save the file into an array and search the array instead of open/closing the file over and over.Posted by TedH (TedH), 3 June 2008 Kevin, I've implemented the loop. There is an almost imperceptible difference, barely noticeable (but I noticed it after a few tries) - tad faster.I'll give you credit for that in the blog script. Don' t know if PM works on this forum, but I'd like to put a link to you on the download page. Thanks - Ted Posted by KevinAD (KevinAD), 3 June 2008 on 06/03/08 at 11:19:37, TedH wrote:
Ted, That is considerate of you, but it is not necesssay. Thanks, Kevin This page is a thread posted to the opentalk forum
at www.opentalk.org.uk and
archived here for reference. To jump to the archive index please
follow this link.
|
| |||||||||||
PH: 01144 1225 708225 • FAX: 01144 1225 793803 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho |