Training, Open Source computer languages

This is page http://www.wellho.net/forum/Perl-Programming/Field-comparison.html

Our email: info@wellho.net • Phone: 01144 1225 708225

 
For 2023 (and 2024 ...) - we are now fully retired from IT training.
We have made many, many friends over 25 years of teaching about Python, Tcl, Perl, PHP, Lua, Java, C and C++ - and MySQL, Linux and Solaris/SunOS too. Our training notes are now very much out of date, but due to upward compatability most of our examples remain operational and even relevant ad you are welcome to make us if them "as seen" and at your own risk.

Lisa and I (Graham) now live in what was our training centre in Melksham - happy to meet with former delegates here - but do check ahead before coming round. We are far from inactive - rather, enjoying the times that we are retired but still healthy enough in mind and body to be active!

I am also active in many other area and still look after a lot of web sites - you can find an index ((here))
Field comparison

Posted by TedH (TedH), 21 January 2009
I've been messing about rewriting my old forum script and managed to get quite a lot redone. Works fine, but I really would like to have the latest reply at the top of the topic list (new topics always get posted there so they are no problem).

I spent hours looking around the web to find out if there is a way to do what I want - to no avail.

The code below reads in the lists of posts
Code:
my($filename) = @_;
$filename =~ s/[\^<>'\$!#;\*\?\&\|\`\/\~\\\(\)\{\}\"\n\r]//go;
$filename =~ s/\.\.//go;
if (open(DATA,"$forumdir/$filename.txt")) {
flock(DATA, 2);
@msglines = <DATA>;
close(DATA);

## get the most recent and put at top - somehow??

foreach $msgline (@msglines) {
if ($msgline ne '') {
@info = split (/_/, $msgline);
$num = $info[0];
$subject = $info[1];
$name = $info[2];
$date = $info[3];
$responses = $info[4];
$replytime = $info[5];
$replyname = $info[6];

print qq ~
<TR valign="top" class="tablesr"><td id="topicsa" class="items">
<A HREF="$ENV{'SCRIPT_NAME'}?msg=$num"><b>$subject</b></A>
<div class="small">Posted on: $date</div> </td>
<td id="topicsa" class="items">$name</td>
<td id="topicsa" align="center" class="items">$responses</td>
<td align="right" id="topicsb" class="small">$replytime<br>by $replyname  &nbsp;<A HREF="$ENV{'SCRIPT_NAME'}?msg=$num#$responses">$viewpost</a></td>
</TR>
~;
}
}
}


I'm guessing that I would need to compare all the $replytime fields with $todaysdate (my variable for the current date/time). Then find the one nearest to it, extract that record from all the rest that are read in and when displayed the latest one would be at the top with all the others below - minus the extracted one ('cuz it's at the beginning of the readout).

Is there a way do that? Or am I barking up the wrong tree?

Date/time format = 01/20/2009-16:43 (if needed).

Thanks in advance - Ted

Posted by admin (Graham Ellis), 21 January 2009
Hi, Ted

If you're currently outputting the posts in the order that they're stored, then you'll need to loop though them all and - rather than print them - store the string for each into a hash.  Key the hash based on the timestamp of the post (are you able to discard the possibility of 2 posts in the same second?) and then sort the keys, writing a loop to output all the posts from the hash based on the order of the sorted keys.

As you won't be outputting the key, it can be just in the bare timestamp format (seconds from 1.1.70) which is easy to sort.

If you are already storing the posts in a list, consider a hash ... an dif they're already in a hash, you may just need to update your keys / sort algorithm.  Looking at your code, I THINK you have the first case not this one, though

Graham

Posted by TedH (TedH), 21 January 2009
Thanks Graham.
Quote:
are you able to discard the possibility of 2 posts in the same second?

Should be able to. I noticed that I have the timestamp to the minute, I'll have to alter that to the second - not sure how to go lower than that.

I'll try out what you've said (take a bit while I learn about hashes).

Posted by KevinAD (KevinAD), 22 January 2009
Its hard to say without knowing how the replies are stored. But I assume they are appened to the end of a file and each reply is a line of its own in the file. So the last element of @msglines might be the latest reply. All you have to do is use $msgline[-1] to get the last line from the file after loading it into an array.

Posted by TedH (TedH), 22 January 2009
Hi Kevin, if only it were that easy. There is a sort of key file for each month that can contain anywhere from one to dozens of 7 field records. Bit like this:

4_Jellystone Blues_Yogi Bear_01/21/2009-18:50:53_1_01/21/2009-22:34:13_Homer
3_Multi Play_Ted_01/21/2009-18:17:08_1_01/21/2009-18:32:29_Homer
2_More topics_Pecos Bill_01/21/2009-10:14:48_0_01/21/2009-10:14:48_Pecos Bill
1_Timestamps_Ted_01/21/2009-09:59:58_1_01/21/2009-10:02:14_Daffy

So if record 2 is replied to, Pecos Bill would change to say Harry. A new time would be in the 6th field.

The more the forum is used, the more month files there will be. All of these are read in by the code I posted and stay in order of the record number (first field). The latest topic posted would be number 4 in the case above.

As the forum is aimed at small users, the actual total will be okay without needing an RDB. Some older versions of this are still in use (inside protected folders) and have a couple megabytes of entries and work quite fast (yes, thank you Cyberman for your dual-core processors).

The monthly files have other small files that are referenced to a final output of the actual topic and it's replies - one file per topic. I'm not an RDB person, but I'd almost say that it's quasi RDB using data from 3 files to get to the topic file (you'd have to see it). I didn't write the original Dan Steinman did years back.

My logic (if one can call it that) was to grab the most recent time in field 6 of all the files read in and pop that to the top of the list.

I've downloaded a bunch of stuff about hashes and am going through them to learn more. I was thinking to also grab the actual time and find the field 6 time closest to it to pull it out then (when I understand hashes better) hash the rest behind it.

- Ted

Posted by KevinAD (KevinAD), 22 January 2009
A general idea. I mixed the data up just to show that it does sort properly. The one problem is if there are two dates the same, which is why I used "push" to store duplicate records in an array. If there can never be dulicate dates you can just use a simple hash instead of a hash of arrays.


my %records;
while (<DATA>) {
  chomp;
  my @data = split(/_/);
  my $date = join '', (split(/[\/:-]/, $data[5]))[2,1,0,3,4,5];
  push @{$records{$date}},$_;
}
foreach my $key (sort {$b cmp $a} keys %records) {
  print "@{$records{$key}}\n";
}      

__DATA__
3_Multi Play_Ted_01/21/2009-18:17:08_1_01/21/2009-18:32:29_Homer
4_Jellystone Blues_Yogi Bear_01/21/2009-18:50:53_1_01/21/2009-22:34:13_Homer
1_Timestamps_Ted_01/21/2009-09:59:58_1_01/21/2009-10:02:14_Daffy
2_More topics_Pecos Bill_01/21/2009-10:14:48_0_01/21/2009-10:14:48_Pecos Bill


This line:

  my $date = join '', (split(/[\/:-]/, $data[5]))[2,1,0,3,4,5];

Converts the date stamp into an easily sortable ascii string instead of trying to sort it numerically. So it has to be in YYYYMMDDHHMMSS order to sort in ascii-betical order.

In the future, if you ever write a script that stores a date in a text file make sure to store the date in epoch seconds as well as any  human readable format.  Then you can easily sort dates by sorting the epoch seconds record.


Posted by TedH (TedH), 22 January 2009
I see what's happening Kevin.

I turned the print into a new array, did a foreach, split it up and printed it out fine.

I understand about the date. The script is one I haven't touched for about 5 years so had Dan's original date format you see in. All my other stuff uses the YYYYMMDDHHMMSS, which then gets redefined depending on what I want.

Also I can see a slow down on larger amounts of data if I don't have a new field with YYYYMMDDHHMMSS in. At least it seems that way to me. If I do that, then I won't have to go through the "date rewrite" for each record, just redfine with the YYYYMMDDHHMMSS  format. Is that right?

I'm gonna play around with to see what happens. Looks like hashes can save a lot of messing about and in some cases do stuff I couldn't otherwise do - never used them before so I'll probably get "gripped" by them for a while

Many thanks guys - Ted

Posted by KevinAD (KevinAD), 23 January 2009
In general, the more data you have to "munge" then you most likely will see a slow down. If you are having to convert a lot of date stamps in the file into a sortable key, depending on how much data there is, it could be slow.  

Also, that all appears to be a lot of work to just get the most recent reply to the top of the topic list. You have to go through ever record in the file. It may not be worth the effort.

If you are just learning how to use hashes, you'll wonder how you ever got along without them. And from there using references to create complex data, like a hash of arrays, will really open up new doors.

PS.....

Hi Graham



This page is a thread posted to the opentalk forum at www.opentalk.org.uk and archived here for reference. To jump to the archive index please follow this link.

© WELL HOUSE CONSULTANTS LTD., 2024: Well House Manor • 48 Spa Road • Melksham, Wiltshire • United Kingdom • SN12 7NY
PH: 01144 1225 708225 • FAX: 01144 1225 793803 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho