Home Accessibility Courses Twitter The Mouth Facebook Resources Site Map About Us Contact
 
Python, Lua, Tcl, C and C++ training - public course schedule [here]
Private courses on your site - see [here]
Please ask about maintenance training for Perl, PHP, Java, Ruby, MySQL and Linux / Tomcat systems
 
Finding all the unique lines in a file, using Python or Perl

A question - how do I process all the unique lines from a file in Python? Asked by a delegate today, solved neatly and easily using a generator which means that there's no need to store all the data - unique values can be passed back and processed onwards as they're found. This is fantastic news if the input isn't really a file, but is some other reporting data source that's slower and you would like to get answers even as the data's still flowing in.

  def unique(source):
    sofar = {}
    for val in open(source):
      if not sofar.get(val):
        yield val.strip()
        sofar[val] = 1
  
  for lyne in unique("info.txt"):
    print lyne


[complete source]. Neat, isn't it? I love Python! And to test that love, I thought I would answer the same question in Perl:

  sub unique {
    open FH,$_[0];
    my %sofar;
    my @uvals;
    while (my $line = ) {
      if (! $sofar{$line}) {
        $sofar{$line} = 1;
        push @uvals,$line;
      }
    }
    return @uvals;
  }
  
  foreach $lyne (unique("info.txt")) {
    print $lyne;
    }


[complete source]. A little longer, and as Perl doesn't have a generator as such, I was tempted to write the code to only return the unique list once the whole incoming data flow had been received. But a little more thought let me produce a generator-line alternative:

  sub unique {
    $static or open FH,$_[0];
    $static = 1;
    while (my $line = ) {
      if (! $sofar{$line}) {
        $sofar{$line} = 1;
        return $line;
      }
    }
    return "";
  }
  
  while ($lyne = unique("info.txt")) {
    print $lyne;
    }


[complete source]. Actually rather neat, but relying on the use of a global variable to note the state of the "generator" routine, and a need to take care to flag the end of the data. Careful code examination will show you that the return ""; is actually redundant, as Perl returns the result of the last expression evaluated, which is false when the loop exits. However, start applying tricks like this and you're getting into code that's going to be hard to maintain.

Truth be know - I love Perl too. See our Perl Courses and Python Courses. Happy to teach you either - to help you use their strengths and write good maintainable code in either.
(written 2012-03-20, updated 2012-03-24)

 
Associated topics are indexed as below, or enter http://melksh.am/nnnn for individual articles
P211 - Perl - Hashes
  [3451] Why would you want to use a Perl hash? - (2011-09-20)
  [3400] $ is atomic and % and @ are molecular - Perl - (2011-08-20)
  [3106] Buckets - (2010-12-26)
  [3072] Finding elements common to many lists / arrays - (2010-11-26)
  [3042] Least Common Ancestor - what is it, and a Least Common Ancestor algorithm implemented in Perl - (2010-11-11)
  [2920] Sorting - naturally, or into a different order - (2010-08-14)
  [2915] Looking up a value by key - associative arrays / Hashes / Dictionaries - (2010-08-11)
  [2836] Perl - the duplicate key problem explained, and solutions offered - (2010-06-28)
  [2833] Fresh Perl Teaching Examples - part 2 of 3 - (2010-06-27)
  [1917] Out of memory during array extend - Perl - (2008-12-02)
  [1856] A few of my favourite things - (2008-10-26)
  [1826] Perl - Subs, Chop v Chomp, => v , - (2008-10-08)
  [1705] Environment variables in Perl / use Env - (2008-07-11)
  [1334] Stable sorting - Tcl, Perl and others - (2007-09-06)
  [968] Perl - a list or a hash? - (2006-12-06)
  [930] -> , >= and => in Perl - (2006-11-18)
  [738] (Perl) Callbacks - what are they? - (2006-05-30)
  [386] What is a callback? - (2005-07-22)
  [240] Conventional restraints removed - (2005-03-09)

Q110 - Object Orientation and General technical topics - Programming Algorithms
  [4707] Some gems from an introduction to Python - (2016-10-29)
  [4656] Identifying the first and last records in a sequence - (2016-02-26)
  [4652] Testing new algorithms in PHP - (2016-02-20)
  [4410] A good example of recursion - a real use in Python - (2015-02-01)
  [4402] Finding sum, minimum, maximum and average in Python (and Ruby) - (2015-01-19)
  [4401] Selecting RECENT and POPULAR news and trends for your web site users - (2015-01-19)
  [4325] Learning to program - what are algorithms and design patterns? - (2014-11-22)
  [3620] Finding the total, average, minimum and maximum in a program - (2012-02-22)
  [3102] AND and OR operators - what is the difference between logical and bitwise varieties? - (2010-12-24)
  [3093] How many toilet rolls - hotel inventory and useage - (2010-12-18)
  [2993] Arrays v Lists - what is the difference, why use one or the other - (2010-10-10)
  [2951] Lots of way of converting 3 letter month abbreviations to numbers - (2010-09-10)
  [2894] Sorting people by their names - (2010-07-29)
  [2617] Comparing floating point numbers - a word of caution and a solution - (2010-02-01)
  [2586] And and Or illustrated by locks - (2010-01-17)
  [2509] A life lesson from the accuracy of numbers in Excel and Lua - (2009-11-21)
  [2259] Grouping rows for a summary report - MySQL and PHP - (2009-06-27)
  [2189] Matching disparate referencing systems (MediaWiki, PHP, also Tcl) - (2009-05-19)
  [1949] Nuclear Physics comes to our web site - (2008-12-17)
  [1840] Validating Credit Card Numbers - (2008-10-14)
  [1391] Ordnance Survey Grid Reference to Latitude / Longitude - (2007-10-14)
  [1187] Updating a page strictly every minute (PHP, Perl) - (2007-05-14)
  [1157] Speed Networking - a great evening and how we arranged it - (2007-04-21)
  [642] How similar are two words - (2006-03-11)
  [227] Bellringing and Programming and Objects and Perl - (2005-02-25)
  [202] Searching for numbers - (2005-02-04)

Y105 - Python - Functions, Modules and Packages
  [4724] From and Import in Python - where is the module loaded from? - (2016-11-06)
  [4722] Embedding more complex code into a named block - (2016-11-04)
  [4719] Nesting decorators - (2016-11-02)
  [4662] Recursion in Python - the classic example - (2016-03-07)
  [4645] What are callbacks? Why use them? An example in Python - (2016-02-11)
  [4448] What is the difference between a function and a method? - (2015-03-04)
  [4441] Reading command line parameters in Python - (2015-02-23)
  [4407] Python - even named code blocks are objects - (2015-01-28)
  [4361] Multiple yields and no loops in a Python generator? - (2014-12-22)
  [4212] Python functions - an introduction to how they work - (2013-11-16)
  [4161] Python varables - checking existance, and call by name or by value? - (2013-08-27)
  [4029] Exception, Lambda, Generator, Slice, Dict - examples in one Python program - (2013-03-04)
  [3945] vargs in Python - how to call a method with unknown number of parameters - (2012-12-06)
  [3931] Optional positional and named parameters in Python - (2012-11-23)
  [3885] Default local - a good choice by the author of Python - (2012-10-08)
  [3852] Static variables in Python? - (2012-08-29)
  [3766] Python timing - when to use a list, and when to use a generator - (2012-06-16)
  [3695] Functions are first class variables in Lua and Python - (2012-04-13)
  [3474] Python Packages - groupings of modules. An introduction - (2011-10-11)
  [3472] Static variables in functions - and better ways using objects - (2011-10-10)
  [3464] Passing optional and named parameters to python methods - (2011-10-04)
  [3459] Catching the fishes first? - (2011-09-27)
  [3280] Passing parameters to Python functions - the options you have - (2011-05-07)
  [3159] Returning multiple values from a function call in various languages - a comparison - (2011-02-06)
  [2998] Using an exception to initialise a static variable in a Python function / method - (2010-10-13)
  [2994] Python - some common questions answered in code examples - (2010-10-10)
  [2929] Passing a variable number of parameters in to a function / method - (2010-08-20)
  [2878] Program for reliability and efficiency - do not duplicate, but rather share and re-use - (2010-07-19)
  [2766] Optional and named parameters to Python functions/methods - (2010-05-15)
  [2718] Python - access to variables in the outer scope - (2010-04-12)
  [2520] Global and Enable - two misused words! - (2009-11-30)
  [2506] Good example of recursion in Python - analyse an RSS feed - (2009-11-18)
  [2481] Sample code with errors in it on our web site - (2009-10-29)
  [2440] Optional parameters to Python functions - (2009-10-07)
  [2439] Multiple returns from a function in Python - (2009-10-06)
  [2011] Conversion of OSI grid references to Eastings and Northings - (2009-01-28)
  [1879] Dynamic code - Python - (2008-11-11)
  [1871] Optional and named parameters in Python - (2008-11-05)
  [1870] What to do with a huge crop of apples - (2008-11-04)
  [1869] Anonymous functions (lambdas) and map in Python - (2008-11-04)
  [1790] Sharing variables with functions, but keeping them local too - Python - (2008-09-09)
  [1784] Global - Tcl, PHP, Python - (2008-09-03)
  [1464] Python Script - easy examples of lots of basics - (2007-12-08)
  [1202] Returning multiple values from a function (Perl, PHP, Python) - (2007-05-24)
  [1163] A better alternative to cutting and pasting code - (2007-04-26)
  [1134] Function / method parameters with * and ** in Python - (2007-04-04)
  [959] It's the 1st, not the 1nd 1rd or 1th. - (2006-12-01)
  [949] Sludge off the mountain, and Python and PHP - (2006-11-27)
  [913] Python - A list of methods - (2006-11-03)
  [912] Recursion in Python - (2006-11-02)
  [900] Python - function v method - (2006-10-20)
  [821] Dynamic functions and names - Python - (2006-08-03)
  [775] Do not duplicate your code - (2006-06-23)
  [749] Cottage industry or production line data handling methods - (2006-06-07)
  [745] Python modules. The distribution, The Cheese Shop and the Vaults of Parnassus. - (2006-06-05)
  [668] Python - block insets help with documentation - (2006-04-04)
  [561] Python's Generator functions - (2006-01-11)
  [418] Difference between import and from in Python - (2005-08-18)
  [340] Code and code maintainance efficiency - (2005-06-08)
  [308] Call by name v call by value - (2005-05-11)
  [303] Lambdas in Python - (2005-05-06)
  [294] Python generator functions, lambdas, and iterators - (2005-04-28)
  [105] Distance Learning - (2004-10-31)
  [96] Variable Scope - (2004-10-22)

Y107 - Python - Dictionaries
  [4668] Sorting a dict in Python - (2016-04-01)
  [4661] Unique word locator - Python dict example - (2016-03-06)
  [4469] Sorting in Python 3 - and how it differs from Python 2 sorting - (2015-04-20)
  [4409] Setting up and using a dict in Python - simple first example - (2015-01-30)
  [4027] Collections in Python - list tuple dict and string. - (2013-03-04)
  [3934] Multiple identical keys in a Python dict - yes, you can! - (2012-11-24)
  [3555] Football league tables - under old and new point system. Python program. - (2011-12-18)
  [3554] Learning more about our web site - and learning how to learn about yours - (2011-12-17)
  [3488] Python sets and frozensets - what are they? - (2011-10-20)
  [2986] Python dictionaries - reaching to new uses - (2010-10-05)
  [2368] Python - fresh examples of all the fundamentals - (2009-08-20)
  [1145] Using a list of keys and a list of values to make a dictionary in Python - zip - (2007-04-13)
  [1144] Python dictionary for quick look ups - (2007-04-12)
  [955] Python collections - mutable and imutable - (2006-11-29)
  [103] Can't resist writing about Python - (2004-10-29)


Back to
Keeping forum and blog comments clean
Previous and next
or
Horse's mouth home
Forward to
Changing shops and organisations - Melksham, the last and next five years
Some other Articles
Makefile variables - defined internally, from the command line and from the environment
Will will smile?
Error checking in a Python program - making your program robust via exceptions
Changing shops and organisations - Melksham, the last and next five years
Finding all the unique lines in a file, using Python or Perl
Keeping forum and blog comments clean
A Pivotal Incident - learning how to welcome your guests
Welcome to Melksham - our new communities
Using Make for a distribution
Basham Festival, Melksham, early August 2012 - a welcome
4750 posts, page by page
Link to page ... 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95 at 50 posts per page


This is a page archived from The Horse's Mouth at http://www.wellho.net/horse/ - the diary and writings of Graham Ellis. Every attempt was made to provide current information at the time the page was written, but things do move forward in our business - new software releases, price changes, new techniques. Please check back via our main site for current courses, prices, versions, etc - any mention of a price in "The Horse's Mouth" cannot be taken as an offer to supply at that price.

Link to Ezine home page (for reading).
Link to Blogging home page (to add comments).

You can Add a comment or ranking to this page

© WELL HOUSE CONSULTANTS LTD., 2017: 404 The Spa • Melksham, Wiltshire • United Kingdom • SN12 6QL
PH: 01144 1225 708225 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho

PAGE: http://www.wellho.net/mouth/3662_Fin ... -Perl.html • PAGE BUILT: Sat May 27 16:49:10 2017 • BUILD SYSTEM: WomanWithCat