Training, Open Source computer languages
PerlPHPPythonMySQLApache / TomcatTclRubyJavaC and C++LinuxCSS 
Search for:
Home Accessibility Courses Diary The Mouth Forum Resources Site Map About Us Contact
counts ans summary the duplicates

Posted by yamaha102 (yamaha102), 20 January 2008
Hi,

I would like to ask  help from you all.

I have a  csv as below

INfile.txt
_____________
Date,Product,Price
20070301,Car,A
20070415,Car,A
20070415,Car,B
20070416,Car,B
20070422,Boat,B
20071201,Boat,B
20071204,Car,A
20080101,Car,A
20080102,Boat,B
20080103,Boat,B
20080104,Car,A
________________



and i wish to get the output as following format:


Outfile.txt
________________
2007,Car,A -3
2007,Car,B -2
2007,Boat,B -2
2008,Car,A -2
2008,Boat,B -2


Please guide.
Thanks.


Posted by KevinAD (KevinAD), 20 January 2008
Well, what have you tried so far?

Posted by yamaha102 (yamaha102), 21 January 2008
thanks for replying and guiding.
at the moment i could able to produce such result:
         CarB     2
         CarA     5
        BoatB     4
Please cntinue to guide me. thanks

$in_file = "INfile.txt";
$out_file = "OutFile.txt";
open (IN, "$in_file") or die "Can't open $in_file: $!\n";
open (OUT, ">$out_file") or die "Can't open $out_file: $!\n";

$header = <IN>; # skip over first line which is a header


while ( $line = <IN> ) {
chomp($line);
($Date, $Product, $Price) = split(',',$line);

$Date= substr($Date,0);
$Product= substr($Product,0);
$Price=substr($Price,0);

       ($hostname,$hostname1,@rest)=($Product,$Price,);  
   $counter{$hostname,$hostname1}++;

}

foreach $host (keys %counter) {
     printf OUT "%16s %4d\n",$host,$counter{$host};
       }


close IN;
close OUT;

Posted by admin (Graham Ellis), 21 January 2008
Quote:
($Date, $Product, $Price) = split(',',$line);

$Date= substr($Date,0);
$Product= substr($Product,0);
$Price=substr($Price,0);

  ($hostname,$hostname1,@rest)=($Product,$Price,);  
   $counter{$hostname,$hostname1}++;


That's a bit odd - the Substrs do nothing, and the list assignment just copies variables.

Try giving the substr on $Date and extra parameter to tell it you want just four characters, then make up the compposite key in quotes and you should be close:

($Date, $Product, $Price) = split(',',$line);
$Date= substr($Date,0,4);
$counter{"$Date,$hostname,$hostname1"}++;




Posted by yamaha102 (yamaha102), 21 January 2008
on 01/21/08 at 05:57:44, Graham Ellis wrote:
That's a bit odd - the Substrs do nothing, and the list assignment just copies variables.

Try giving the substr on $Date and extra parameter to tell it you want just four characters, then make up the compposite key in quotes and you should be close:

($Date, $Product, $Price) = split(',',$line);
$Date= substr($Date,0,4);
$counter{"$Date,$hostname,$hostname1"}++;



ok. i tried it..but it give me a silly character " "after the date and after the product as below
200801CarA 5
Thanks
How to eliminate it?
by the way do you have better and easier solution to solve the question?

Posted by yamaha102 (yamaha102), 21 January 2008
besides,
i have no idea on how to rearrange the above output file to become the final output as below

Date,Car A, Car B, Boat B
2007,3,2,2
2008,2,0,2
       
Please guide me.

Posted by KevinAD (KevinAD), 21 January 2008
yamaha102, Is this some type of school work?

Graham, does this site have a policy about students posting school/class assignments?

Posted by yamaha102 (yamaha102), 21 January 2008
on 01/21/08 at 08:28:11, KevinAD wrote:
yamaha102, Is this some type of school work?

Graham, does this site have a policy about students posting school/class assignments?

Hi, it's no a school work. I just trying to extract data from a file and then display into a desired format.
I think most of the newbie learning programming by example. It does apply to myself too.
please continue to guide me on above questions.
thanks

Posted by admin (Graham Ellis), 21 January 2008
on 01/21/08 at 08:28:11, KevinAD wrote:
Graham, does this site have a policy about students posting school/class assignments?


We are told that this isn't a class assigment ... and from reading the earlier posts I had alreday wondered about it, and decided it probably wasn't. But nice to have the confirmation. So the following comments do NOT apply to the current question.

If a student posts up a complete assignment and says "I have to do this - will you write it for me" then the answer should be no.  The student learns nothing from having the work done for them except to take advantage of other people, and will find the next assignment even harder.  And it's probably that the question posted here should be posted anyway because it's someone else's copyright.

Where students attempt a question and need help with some elements, then I'm very much more inclined to help and give them pointers, but still very much aware their objective is to learn how to write a program rather than to end up with a complete program for production use.   I tend to suggest that their first source of assistance should be their tutor or other college support mechanism, as we're liable on this board to suggest a practical answer, but one that goes ahead and uses techniques and structures they haven't yet learned.



Posted by yamaha102 (yamaha102), 21 January 2008
Hi Graham,
Thanks for your input. i appreciate your intention.

I am alone in perl programming,no mentor, nor guider... that's why i use forumers as my mentors. I wish to use perl to extract some useful format of my raw data for the analysis purpose.

I do have a question...  what for that we have the CPAN module out there? Don't it mean that we are taking
advantage of other people by using their module?

Sorry and thanks.


Posted by KevinAD (KevinAD), 21 January 2008
on 01/21/08 at 12:00:02, Graham Ellis wrote:
We are told that this isn't a class assigment ... and from reading the earlier posts I had alreday wondered about it, and decided it probably wasn't. But nice to have the confirmation. So the following comments do NOT apply to the current question.

If a student posts up a complete assignment and says "I have to do this - will you write it for me" then the answer should be no.  The student learns nothing from having the work done for them except to take advantage of other people, and will find the next assignment even harder.  And it's probably that the question posted here should be posted anyway because it's someone else's copyright.

Where students attempt a question and need help with some elements, then I'm very much more inclined to help and give them pointers, but still very much aware their objective is to learn how to write a program rather than to end up with a complete program for production use.   I tend to suggest that their first source of assistance should be their tutor or other college support mechanism, as we're liable on this board to suggest a practical answer, but one that goes ahead and uses techniques and structures they haven't yet learned.




I agree with all you said. Some forums just do not allow any student posting at all. I did not think that would be your policy and now I  know it's not.

Posted by admin (Graham Ellis), 21 January 2008
I am getting very confused as to how the text suddenly changed into numbers; I think you have changed something else other than those items we have suggested.   As we have established that you have a requirement to get the code working rather than to learn as part of an assignment, I have put together a sample and tested it.

Code:
while (<DATA>) { chop;
       ($date,$veh,$ab) = split (/,/);
       ($yr) = ($date =~ /(....)/);
       $counter{"$yr,$veh,$ab"}++;
       }
foreach $k (keys %counter) {
       print ("$k -$counter{$k}\n");
       }
__END__
20070301,Car,A
20070415,Car,A
20070415,Car,B
20070416,Car,B
20070422,Boat,B
20071201,Boat,B
20071204,Car,A
20080101,Car,A
20080102,Boat,B
20080103,Boat,B
20080104,Car,A




Posted by yamaha102 (yamaha102), 22 January 2008
thanks to be patient with me. well.. my goal is to produce a
finalOutput.txt (which is jz an different view of orientation from the outputfile.txt)
(jz assume that there is no comma between product and price car, A===> car A)
for the text suddenly changed into numbers because ---> look at the output file which carA in 2007 is 3 and so on
Date,Car A, Car B, Boat B
2007,3,2,2
2008,2,0,2
I donno whether my approach is correct or not(from raw ->output ->finalOutput)

The second post from me is the code to produce the output.txt.
The Substrs I used to eliminate some of the string for my usage purpose.
with ur guideline by adding $counter{"$Date,$hostname,$hostname1"}++; i managed
to produce the output.txt
basically the problem i am facing is \how am i going to "rotate - grouping the first column and put it at the left side and put the second column on the top.." the output to finalOutput


Thanks.

Posted by yamaha102 (yamaha102), 22 January 2008
it's similar to Data::Xtab
http://search.cpan.org/dist/Data-Xtab/Xtab.pm

but for the module i have the problem to get the desired output: (suppose i could get the output as below..)

         JAN     FEB     MAR     APR     MAY     JUN
 A       4       0       0       0       0       107
 B       3       39      0       0       0       0
 C       0       0       0       0       8       0


my output: (missing A B C on the most left)
JAN     FEB     MAR     APR     MAY     JUN
4       0       0       0       0       107
3       39      0       0       0       0
0       0       0       0       8       0

Posted by admin (Graham Ellis), 22 January 2008
This looks like a change in specification. I'm sorry, but I don't understand how "just rotating the data" suddenly brings months in as well ... and it is very frustrating for those of us who are trying to help you to solve the original question as posted just to be told that the goalposts have moved.

At the risk of writing what sounds very obvious, can I suggest that you work out a specification (description) of what you need to do as a whole and give an overview of that if you need any further help.  By all means ask questions to help you step along the way as well, but - PLEASE - the full picture as well.  It may turn out that the steps along the way can be missed out, and knowing the whole picture will certainly help avoid us (the people trying to help) feeling that out time is being wasted.

Posted by yamaha102 (yamaha102), 22 January 2008
I am sorry.
I will try to do my best in my explanation from now onwards
Basically there are two goal in the task...
1st ->raw.txt change to output.txt (This one Completed... thanks for guiding)
2nd ->output.txt to finaloutput.txt (This is the problem now)

for the 2nd, i try to use DATA::Xtab to solve it, but i face some difficulties. the output is different than desired .

raw.txt
Date,Product, Price
20070101,Car, A
20070102, Car, A
20080102, Car, B
20080103, Boat, A
20080104, Boat, B
20080103, Boat, B

output.txt
2007,Car, A, 2
2008,Car, B, 1
2008,Boat, A, 1
2008,Boat, B, 2

finaloutput.txt
BLANK,Car,Boat
2007,2,0
2008,1,3
TOTAL,3,3

thank you

Posted by KevinAD (KevinAD), 22 January 2008
This is a bit contrived to work with your data. The hope is that it helps you in some way, not that it really solves your problem. I added an extra product to the data file, Plane, just so you can sort of get a better feel for how it works.

Code:
use strict;
use warnings;
my %data = ();
my @products = qw(car plane boat);
open(IN, 'c:/perl_test/data.txt') or die "$!";
readline IN; #skip header
while(<IN>){
  chomp;
  my ($date, $prod, $price) = split(/,/);
  my ($y, $m, $d) = unpack("a4a2a2",$date);
  $data{$y}{lc $prod}++;
}
close IN;

print join(',','year',@products), "\n";
foreach my $key (sort keys %data) {
  print join(',',$key, map{ $data{$key}{lc($_)}||'0' } @products),"\n";
}


Note that this approach requires you know all the product names beforehand. That can be changed though to make the @product list dynamic.

The input file (data.txt):

Date,Product,Price
20070301,Car,A
20070415,Car,A
20070415,Car,B
20070416,Car,B
20070422,Boat,B
20091228,Plane,D
20071201,Boat,B
20071204,Car,A
20080101,Car,A
20080102,Boat,B
20080103,Boat,B
20080104,Car,A

Posted by KevinAD (KevinAD), 22 January 2008
When learning data structures like I used, it's very helpful to use the Data:umper module, which will print a "picture" of the data in a manner that makes it easy to see how the data is structured and to catch problems.

Like in my code, originally the quantities were not printing. That was because the @products array used all lower-case words, but the data file had upper-case characters. When I used Data:umper to print out the data I noticed the discrepancy and used the lc() function to make sure all hash keys and product names match case.

Posted by yamaha102 (yamaha102), 23 January 2008
Thanks.

From your code, I somehow learnt on the map and unpack terms.
Thank you.

With my limitation of knowledge about perl, I tried to modify your code  in order to sum all the columns and the ouput file as below:

Year,car,plane,boat,TOTAL
2007,5,0,2,0
2008,2,0,2,0
2009,0,1,0,0
ALL,7,1,4,0,12

However the silly code i added to your code was not practical enough as there's alot of "variable" like $count4 += @Raw[3]...

I have a few questions:
1. Mind to suggest to me how to simply the steps, please?
2. While i was trying to play around with your code, i do have the difficulty to sum up all the number for row. If you have the time, please suggest to me which step i need to take to counter the problem.
3. Is it more difficult for me to directly convert the output.txt to finaloutput.txt compare to the raw.txt --> finaloutput?


use warnings;
my %data = ();
my @products = qw(car plane boat TOTAL);
$products="@products\n";
print $products;
open(IN, 'c:/perl/data.txt') or die "$!";
readline IN; #skip header
while(<IN>){
  chomp;
  my ($date, $prod, $price) = split(/,/);
  my ($y, $m, $d) = unpack("a4a2a2",$date);
  $data{$y}{lc $prod}++;

  $counter{"$y"}++;


}
close IN;

print join(',','Year',@products), "\n";
foreach my $key (sort keys %data) {

print join(',',$key, map{ $data{$key}{lc($_)}||'0'} @products),"\n";

@Raw=map{ $data{$key}{lc($_)}||'0'} @products;
#print "@Raw[2]\n";
$count1 += @Raw[0];
$count2 += @Raw[1];
$count3 += @Raw[2];
$count4 += @Raw[3];

}

$countAll =($count1 + $count2 + $count3 + $count4);
print "ALL,$count1,$count2,$count3,$count4,$countAll\n,";


Thank you.

Posted by KevinAD (KevinAD), 23 January 2008
I think what you may need to doin order to better undestand my code is read some tutorials on perls complex data capabilities. The first three tutorial on this page cover references (complex data):

http://perldoc.perl.org/index-tutorials.html

the seond one is the most indepth:

perldsc - data structure complex data structure struct

Your attempt to add the total for each row shows a complete lack of understanding of what my code is doing so try and read the tutorials I linked you to. Complex data is a little confusing at first, but once you learn how to use references a whole new world of possibilities opens up. Here is some revised code:

use strict;
use warnings;
use Data:umper;
my %data = ();
my @products = qw(car plane boat);
open(IN, 'c:/perl_test/data.txt') or die "$!";
readline IN; #skip header
while(<IN>){
  chomp;
  my ($date, $prod, $price) = split(/,/);
  my ($y, $m, $d) = unpack("a4a2a2",$date);
  $data{$y}{lc $prod}++;
}
close IN;

print join(',','year',@products,'TOTAL'), "\n";
foreach my $key (sort keys %data) {
  print $key;
  my $total = 0;
  foreach my $p (@products) {
     print ',', $data{$key}{lc($p)} || '0';
     $total += $data{$key}{lc($p)} || 0;
  }
  print ",$total\n";
}


Of course this is not the only way to go about doing this. 10 people will write this 10 differnt ways, using various methods to arrive at the same output.


I have a few questions:  
1. Mind to suggest to me how to simply the steps, please?
 I don't understand what you are asking.
2. While i was trying to play around with your code, i do have the difficulty to sum up all the number for row. If you have the time, please suggest to me which step i need to take to counter the problem.
 See the code I posted for one possible solution
3. Is it more difficult for me to directly convert the output.txt to finaloutput.txt compare to the raw.txt --> finaloutput?
  You do not need any intermediary output file. From one data set you can produce output.txt and finaloutput.txt.

Posted by yamaha102 (yamaha102), 24 January 2008
Thanks.
I will try to digest it.
Thank you.

Posted by yamaha102 (yamaha102), 5 March 2008
Hi, sorry to trouble you guys again.

i got 2 files as below:
companyA.txt
_____________
Date,Product,Price
20060301,Car,A
20070415,Car,A
20070415,Car,B
20070416,Plane,B
20080422,Boat,B


companyB.txt
_____________
Date,Product,Price
20060301,Car,A
20070415,Car,A
20070415,Car,B

_____________
use strict;
use warnings;

my %data = ();
my %data=();
my @products = qw(car plane boat);
open(IN, 'c:/perl_test/campanyA.txt') or die "$!";
readline IN; #skip header
while(<IN>){
  chomp;
  my ($date, $prod, $price) = split(/,/);
  my ($y, $m, $d) = unpack("a4a2a2",$date);
  $data{$y}{lc $prod}++;
}
close IN;
print join(',','year','TOTALA'), "\n";
foreach my $key (sort keys %data) {
  print $key;
  my $total = 0;
  foreach my $p (@products) {
$total += $data{$key}{lc($p)} || 0;
  }
  print ",$total\n";
}

open(IN, 'c:/perl_test/companyB.txt') or die "$!";
readline IN; #skip header
while(<IN>){
  chomp;
  my ($date, $prod, $price) = split(/,/);
  my ($y, $m, $d) = unpack("a4a2a2",$date);
  $data{$y}{lc $prod}++;
}
close IN;

print join(',','year','TOTALB'), "\n";
foreach my $key (sort keys %data1) {
  print $key;
  my $total = 0;
  foreach my $p (@products) {
$total += $data1{$key}{lc($p)} || 0;
  }
  print ",$total\n";
}


__Output1__
Year,TOTALA
2006,1
2007,3
2008,1

Year,TOTALB
2006,1
2007,2

______________

emm... Actually i wish to get another output file from the above outputs (comparing the outputs -->TOTALA -TOTALB) where i could get

__OUTPUT2___
Year,TOTAL_NEW
2006,0
2007,1
2008,1


please guide me on how to process the desire output(OUTPUT2) without process the OUTPUT1.
Thanks.




This page is a thread posted to the opentalk forum at www.opentalk.org.uk and archived here for reference. To jump to the archive index please follow this link.

You can Add a comment or ranking to this page

© WELL HOUSE CONSULTANTS LTD., 2014: Well House Manor • 48 Spa Road • Melksham, Wiltshire • United Kingdom • SN12 7NY
PH: 01144 1225 708225 • FAX: 01144 1225 899360 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho