Training, Open Source computer languages

This is page http://www.wellho.net/forum/Perl-Programming/removing ... -file.html

Our email: info@wellho.net • Phone: 01144 1225 708225

 
For 2023 (and 2024 ...) - we are now fully retired from IT training.
We have made many, many friends over 25 years of teaching about Python, Tcl, Perl, PHP, Lua, Java, C and C++ - and MySQL, Linux and Solaris/SunOS too. Our training notes are now very much out of date, but due to upward compatability most of our examples remain operational and even relevant ad you are welcome to make us if them "as seen" and at your own risk.

Lisa and I (Graham) now live in what was our training centre in Melksham - happy to meet with former delegates here - but do check ahead before coming round. We are far from inactive - rather, enjoying the times that we are retired but still healthy enough in mind and body to be active!

I am also active in many other area and still look after a lot of web sites - you can find an index ((here))
removing and counting duplicates in file

Posted by hjortur (hjortur), 4 November 2007
Hi. I'm new to Perl and I am trying to write a code that takes a file and removes any duplicates, then writes the number of occurring duplicates.
I am trying to combine my knowlege in C with Perl to do this,  but I am unfamiliar with the format and have been unsuccessfull in finding any clues on how to do this.

The file to be processed would look something like this:

192.168.1.1
192.168.1.1
200.1.1.2
201.43.43.1
10.0.0.1
200.1.1.2
192.168.1.1

The resulting output would be...

3    192.168.1.1
2    200.1.1.2
1    201.43.43.1
1    10.0.0.1
1    201.43.43.1

Tks for any assistance...

Hjortur..

Posted by admin (Graham Ellis), 4 November 2007
You probably want to go "beyond" C and look at Perl's hashes.  

Have a look at:

http://www.wellho.net/resources/ex.php4?item=p211/web_count

which is the answer to a practical exercise on our Perl Programming Course.  We ask the delegates to read an access log file of many thousand lines, and report on the number of times each client (identified by a unique host name in the first column) has visited us ...

Posted by KevinAD (KevinAD), 5 November 2007
This is a very simple task for perl.

Code:
#!/usr/bin/perl

use strict;
use warnings;

my $file = '/path/to/file.txt';
my %seen = ();
{
  local @ARGV = ($file);
  local $^I = '.bac';
  while(<>){
     $seen{$_}++;
     next if $seen{$_} > 1;
     print;
  }
}
foreach my $keys  ( sort {$seen{$b} <=> $seen{$a}} keys %seen) {
  print "$keys = $seen{$keys}\n";
}


Posted by george_Ball (george), 5 November 2007
Are you running on Unix/Linux? If so, then I'd suggest you don't bother with Perl, and use straight Unix:

 sort file | uniq -c

which will generate exactly the output you want.

Of course if you aren't using *x then... well, sorry for you!!


Posted by KevinAD (KevinAD), 5 November 2007
But if a person is trying to learn perl using nix commands is not going to help to learn perl. On the other hand, if all they want is to accomplish the task, that is a good suggestion.

Posted by hjortur (hjortur), 5 November 2007
Tks George

This worked...! I was trying to write the code using basicly the same
techniques as with C so the code was rather ugly....with nested for loops...
As I am new to perl, I am not exactly sure how this works..
for example :  local $^I = '.bac';
I have been reading up on perl and was trying to program associative arrays...I guess this is related...
Tks again..
Hjortur

Posted by george_Ball (george), 5 November 2007
javascript:embarassed()

Sorry, I didnt think when I posted and just assumed you had the problem to solve whatever way...

The solution that Kevin has posted is, as you have probably worked out, the best way to do this from Perl - one of the things I find continually when I am teaching Perl is that people don't appreciate just how much hashes can do for you, with problems like this being the perfect example of how they can save you work.

Happy hacking!!

Posted by KevinAD (KevinAD), 5 November 2007
on 11/05/07 at 17:23:57, hjortur wrote:
Tks George

This worked...! I was trying to write the code using basicly the same
techniques as with C so the code was rather ugly....with nested for loops...
As I am new to perl, I am not exactly sure how this works..
for example :  local $^I = '.bac';
I have been reading up on perl and was trying to program associative arrays...I guess this is related...
Tks again..
Hjortur


"$^I" is a perl variable. Perl has many predefined variables that affect the way perl works. Here is the list of perl 5.8.8 variables:

http://perldoc.perl.org/perlvar.html

"$^I"  tells perl to use the inplace (or streaming) editor. It has nothing to do with associative arrays though. "local" tells perl to use a temporary value for the variable you declare with "local" to the enclosing block it's used in. This way it does not globally affect your perl program. Once the block is exited perl restores the old value to the variable. Using "local" is just a good habit to get into when using perls predefined variables in your perl programs.

The associative array (hash) is %seen which is used to find duplicates and remove them from the file. Because hash keys must be unique, they are well suited for finding duplicate data in files as well as other things.

Posted by hjortur (hjortur), 5 November 2007
Tks KevinAD

This code is very short and straight to the point...looking forward to learning Perl... it seems simple...but..
What does the "next if $seen{$_} > 1;" do by the way?
and how are the duplicates deleted? It is a bit hard to grasp...

H

Posted by KevinAD (KevinAD), 6 November 2007
the hash %seen is counting how many times each line is found in the file:

$seen{$_}++;

If the quantity is greater than one the line is skipped:


next if $seen{$_} > 1;

"next" is a loop control. It tells perl to jump to the next iteration of the loop immediately. In this case it's the "while" loop.

So the "print" line is never evaluated if $seen{$_} is greater than 1 ($seen{$_} > 1) and that effectively deletes that line from the file.





Posted by hjortur (hjortur), 6 November 2007
Great Tks alot..!

Now I can continue experimenting....

Good to know I can get such a great help here if
I get stuck...!

H



This page is a thread posted to the opentalk forum at www.opentalk.org.uk and archived here for reference. To jump to the archive index please follow this link.

© WELL HOUSE CONSULTANTS LTD., 2024: Well House Manor • 48 Spa Road • Melksham, Wiltshire • United Kingdom • SN12 7NY
PH: 01144 1225 708225 • FAX: 01144 1225 793803 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho