A question - how do I process all the unique lines from a file in Python? Asked by a delegate today, solved neatly and easily using a generator which means that there's no need to store all the data - unique values can be passed back and processed onwards as they're found. This is fantastic news if the input isn't really a file, but is some other reporting data source that's slower and you would like to get answers even as the data's still flowing in.
def unique(source):
sofar = {}
for val in open(source):
if not sofar.get(val):
yield val.strip()
sofar[val] = 1
for lyne in unique("info.txt"):
print lyne
[complete source]. Neat, isn't it? I love Python! And to test that love, I thought I would answer the same question in Perl:
sub unique {
open FH,$_[0];
my %sofar;
my @uvals;
while (my $line = ) {
if (! $sofar{$line}) {
$sofar{$line} = 1;
push @uvals,$line;
}
}
return @uvals;
}
foreach $lyne (unique("info.txt")) {
print $lyne;
}
[complete source]. A little longer, and as Perl doesn't have a generator as such, I was tempted to write the code to only return the unique list once the whole incoming data flow had been received. But a little more thought let me produce a generator-line alternative:
sub unique {
$static or open FH,$_[0];
$static = 1;
while (my $line = ) {
if (! $sofar{$line}) {
$sofar{$line} = 1;
return $line;
}
}
return "";
}
while ($lyne = unique("info.txt")) {
print $lyne;
}
[complete source]. Actually rather neat, but relying on the use of a global variable to note the state of the "generator" routine, and a need to take care to flag the end of the data. Careful code examination will show you that the
return ""; is actually redundant, as Perl returns the result of the last expression evaluated, which is
false when the loop exits. However, start applying tricks like this and you're getting into code that's going to be hard to maintain.
Truth be know - I love Perl too. See our
Perl Courses and
Python Courses. Happy to teach you either - to help you use their strengths and write good maintainable code in either.
(written 2012-03-20, updated 2012-03-24)
41e3
Associated topics are indexed under
Q110 - Object Orientation and General technical topics - Programming Algorithms [3620] Finding the total, average, minimum and maximum in a program - (2012-02-22)
[3451] Why would you want to use a Perl hash? - (2011-09-20)
[3102] AND and OR operators - what is the difference between logical and bitwise varieties? - (2010-12-24)
[3093] How many toilet rolls - hotel inventory and useage - (2010-12-18)
[3072] Finding elements common to many lists / arrays - (2010-11-26)
[3042] Least Common Ancestor - what is it, and a Least Common Ancestor algorithm implemented in Perl - (2010-11-11)
[2993] Arrays v Lists - what is the difference, why use one or the other - (2010-10-10)
[2951] Lots of way of converting 3 letter month abbreviations to numbers - (2010-09-10)
[2894] Sorting people by their names - (2010-07-29)
[2617] Comparing floating point numbers - a word of caution and a solution - (2010-02-01)
[2586] And and Or illustrated by locks - (2010-01-17)
[2509] A life lesson from the accuracy of numbers in Excel and Lua - (2009-11-21)
[2259] Grouping rows for a summary report - MySQL and PHP - (2009-06-27)
[2189] Matching disparate referencing systems (MediaWiki, PHP, also Tcl) - (2009-05-19)
[1949] Nuclear Physics comes to our web site - (2008-12-17)
[1840] Validating Credit Card Numbers - (2008-10-14)
[1391] Ordnance Survey Grid Reference to Latitude / Longitude - (2007-10-14)
[1187] Updating a page strictly every minute (PHP, Perl) - (2007-05-14)
[1157] Speed Networking - a great evening and how we arranged it - (2007-04-21)
[642] How similar are two words - (2006-03-11)
[227] Bellringing and Programming and Objects and Perl - (2005-02-25)
[202] Searching for numbers - (2005-02-04)
Y107 - Python - Dictionaries [4029] Exception, Lambda, Generator, Slice, Dict - examples in one Python program - (2013-03-04)
[4027] Collections in Python - list tuple dict and string. - (2013-03-04)
[3934] Multiple identical keys in a Python dict - yes, you can! - (2012-11-24)
[3555] Football league tables - under old and new point system. Python program. - (2011-12-18)
[3554] Learning more about our web site - and learning how to learn about yours - (2011-12-17)
[3488] Python sets and frozensets - what are they? - (2011-10-20)
[3464] Passing optional and named parameters to python methods - (2011-10-04)
[2994] Python - some common questions answered in code examples - (2010-10-10)
[2986] Python dictionaries - reaching to new uses - (2010-10-05)
[2915] Looking up a value by key - associative arrays / Hashes / Dictionaries - (2010-08-11)
[2368] Python - fresh examples of all the fundamentals - (2009-08-20)
[1145] Using a list of keys and a list of values to make a dictionary in Python - zip - (2007-04-13)
[1144] Python dictionary for quick look ups - (2007-04-12)
[955] Python collections - mutable and imutable - (2006-11-29)
[103] Can't resist writing about Python - (2004-10-29)
Y105 - Python - Functions, Modules and Packages [3945] vargs in Python - how to call a method with unknown number of parameters - (2012-12-06)
[3931] Optional positional and named parameters in Python - (2012-11-23)
[3885] Default local - a good choice by the author of Python - (2012-10-08)
[3852] Static variables in Python? - (2012-08-29)
[3766] Python timing - when to use a list, and when to use a generator - (2012-06-16)
[3695] Functions are first class variables in Lua and Python - (2012-04-13)
[3474] Python Packages - groupings of modules. An introduction - (2011-10-11)
[3472] Static variables in functions - and better ways using objects - (2011-10-10)
[3459] Catching the fishes first? - (2011-09-27)
[3280] Passing parameters to Python functions - the options you have - (2011-05-07)
[3159] Returning multiple values from a function call in various languages - a comparison - (2011-02-06)
[2998] Using an exception to initialise a static variable in a Python function / method - (2010-10-13)
[2929] Passing a variable number of parameters in to a function / method - (2010-08-20)
[2878] Program for reliability and efficiency - do not duplicate, but rather share and re-use - (2010-07-19)
[2766] Optional and named parameters to Python functions/methods - (2010-05-15)
[2718] Python - access to variables in the outer scope - (2010-04-12)
[2520] Global and Enable - two misused words! - (2009-11-30)
[2506] Good example of recursion in Python - analyse an RSS feed - (2009-11-18)
[2481] Sample code with errors in it on our web site - (2009-10-29)
[2440] Optional parameters to Python functions - (2009-10-07)
[2439] Multiple returns from a function in Python - (2009-10-06)
[2011] Conversion of OSI grid references to Eastings and Northings - (2009-01-28)
[1879] Dynamic code - Python - (2008-11-11)
[1871] Optional and named parameters in Python - (2008-11-05)
[1870] What to do with a huge crop of apples - (2008-11-04)
[1869] Anonymous functions (lambdas) and map in Python - (2008-11-04)
[1790] Sharing variables with functions, but keeping them local too - Python - (2008-09-09)
[1784] Global - Tcl, PHP, Python - (2008-09-03)
[1464] Python Script - easy examples of lots of basics - (2007-12-08)
[1202] Returning multiple values from a function (Perl, PHP, Python) - (2007-05-24)
[1163] A better alternative to cutting and pasting code - (2007-04-26)
[1134] Function / method parameters with * and ** in Python - (2007-04-04)
[959] It's the 1st, not the 1nd 1rd or 1th. - (2006-12-01)
[949] Sludge off the mountain, and Python and PHP - (2006-11-27)
[913] Python - A list of methods - (2006-11-03)
[912] Recursion in Python - (2006-11-02)
[900] Python - function v method - (2006-10-20)
[821] Dynamic functions and names - Python - (2006-08-03)
[775] Do not duplicate your code - (2006-06-23)
[749] Cottage industry or production line data handling methods - (2006-06-07)
[745] Python modules. The distribution, The Cheese Shop and the Vaults of Parnassus. - (2006-06-05)
[668] Python - block insets help with documentation - (2006-04-04)
[561] Python's Generator functions - (2006-01-11)
[418] Difference between import and from in Python - (2005-08-18)
[386] What is a callback? - (2005-07-22)
[340] Code and code maintainance efficiency - (2005-06-08)
[308] Call by name v call by value - (2005-05-11)
[303] Lambdas in Python - (2005-05-06)
[294] Python generator functions, lambdas, and iterators - (2005-04-28)
[105] Distance Learning - (2004-10-31)
[96] Variable Scope - (2004-10-22)
P211 - Perl - Hashes [3400] $ is atomic and % and @ are molecular - Perl - (2011-08-20)
[3106] Buckets - (2010-12-26)
[2920] Sorting - naturally, or into a different order - (2010-08-14)
[2836] Perl - the duplicate key problem explained, and solutions offered - (2010-06-28)
[2833] Fresh Perl Teaching Examples - part 2 of 3 - (2010-06-27)
[1917] Out of memory during array extend - Perl - (2008-12-02)
[1856] A few of my favourite things - (2008-10-26)
[1826] Perl - Subs, Chop v Chomp, => v , - (2008-10-08)
[1705] Environment variables in Perl / use Env - (2008-07-11)
[1334] Stable sorting - Tcl, Perl and others - (2007-09-06)
[968] Perl - a list or a hash? - (2006-12-06)
[930] -> , >= and => in Perl - (2006-11-18)
[738] (Perl) Callbacks - what are they? - (2006-05-30)
[240] Conventional restraints removed - (2005-03-09)
5759
Some other Articles
Makefile variables - defined internally, from the command line and from the environmentWill will smile?Error checking in a Python program - making your program robust via exceptionsChanging shops and organisations - Melksham, the last and next five yearsFinding all the unique lines in a file, using Python or PerlKeeping forum and blog comments cleanA Pivotal Incident - learning how to welcome your guestsWelcome to Melksham - our new communitiesUsing Make for a distributionBasham Festival, Melksham, early August 2012 - a welcome