Training, Open Source Programming Languages

This is page http://www.wellho.net/mouth/3766_Pyt ... rator.html

Our email: info@wellho.net • Phone: 01144 1225 708225

 
For 2021 - online Python 3 training - see ((here)).

Our plans were to retire in summer 2020 and see the world, but Coronavirus has lead us into a lot of lockdown programming in Python 3 and PHP 7.
We can now offer tailored online training - small groups, real tutors - works really well for groups of 4 to 14 delegates. Anywhere in the world; course language English.

Please ask about private 'maintenance' training for Python 2, Tcl, Perl, PHP, Lua, etc.
Python timing - when to use a list, and when to use a generator

If you're going to cook dinner for the family, it's much more efficient to cook everyone's meal together. If you're a family of four, you won't want to cook four separate pans of carrots, but rather you'll want to cook them all together. But now imagine you're holding a carrot street party, and providing carrots through the day for four hundred rather than four people. You could hire in (or purchase) special resources, but you're much more likely to adopt some sort of batch system. You won't cook all the carrots at once - you're pans aren't big enough, and even if they were, you've not got the capacity to store all of the cooked carrots until they're needed for serving.

The same thing applies when you're programming. In a program with a small amount of data, it's quicker and easier to handle it all via a list, but as the data size grows so would the amount of memory you need for a list, and it becomes quicker to handle the data as it becomes available - using a production line type approach known as an iterator or a generator.

Grand theory - how does that work in practise? In a language like Python, you can use a generator function. I've written about those before (examples HERE and HERE if you want to refer back). In Lua, you would use co-routines, and in any other language you could roll your own. But the question remains "Does it really make a difference?", and I thought I would try it out to see.

Let's look at the tools I'll use:

• Seven log files off our server, for periods from a minute (43 accesses) to a year (nearly 30 million accesses):

  munchkin:pyjun12 grahamellis$ wc -l ac[a-z]* | sort -bn
         43 acminute
       2671 achour
      79015 acday
     533818 acweek
     992112 acfortnight
    2481275 acmonth
   29775300 acyear


• Two programs dave - which stores a list and jenny - which processes each record as it reads it, using a generator. Both programs produce identical output files when run on the same input; I've checked this using the Unix diff command.

• Python's resource module, which is supplied with the standard Unix / Linux distributions, but needs to be imported into your program. See python documentation for full details.

The code I have added at the end of both dave and jenny, to report on resources, is as follows:

  import resource
  used = resource.getrusage(resource.RUSAGE_SELF)
  print "Maximum resident size:",used.ru_maxrss
  print "Run time:",used.ru_utime + used.ru_stime
  print "Page faults:",used.ru_minflt + used.ru_majflt


So - how does that run?

Maximum memory size while running:

Durationlistgenerator
Minute41164804112384
Hour48373764108288
Day303554564108288
Week1810554884108288
Fortnight3334021124112384
Month8370421764108288
Year33514209284096000


Run time (sum of system and user time):

Durationlistgenerator
Minute0.0553450.056894
Hour0.0640150.062101
Day0.3555560.296621
Week2.0889761.718086
Fortnight3.8331323.145695
Month9.5169047.700457
Year353.116987103.592527


Page faults (sum of minor and major):

Durationlistgenerator
Minute14791479
Hour16531476
Day80441476
Week456381476
Fortnight837961478
Month2091811476
Year91430681491


With a generator, the amount of memory used remains around 4 Mbytes level no matter how much data is thrown at the program, but reading all the data into a list before processing any of it starts at around the same 4 Mbytes and rises to over 3 Gbytes - not really a surprise, since the data file's well over 8 Gbytes in size for the year. You'll notice too how the run time was roughly the same when the data set was small, but when using a list the application became significantly slower than the generator equivalent as the data set got large. In fact, best advice is always use a generator of the data set may get large - if the data set is small, a generator is almost as fast and the process is always quick. But if the data set is large, using a list will be much slower - and that's a slowing down of what's already a long process.

As a final thought ... look at the "page fault" table. Time shared operating systems such as linux run programs in "pages" and load and store pages from memory to allow for effective concurrency. The page fault figure that I've quotes tells me how many times my program has said "oops - I'll have go go off and find that page", and when I'm storing the whole list that's a very large number. With a generator, the figure's virtually unchanging - once the program's in memory it's ticking over sweetly and can carry on even if we throw a decade worth of data at it. I would expect severe problems if I through a decade of data at my list example ...





One of the very first Python Courses I delivered was to a company handling a huge data set, and by changing their lists to generators in existing code we made a huge difference for them. I had delegates from the same organisation on last week's Python course, and they still have very large data sets indeed. Thus this follow up blog, to help them put figures on - at least - a benchmark piece of code and data.


(written 2012-06-16)

 
Associated topics are indexed as below, or enter http://melksh.am/nnnn for individual articles
Y305 - Optimising Python
  [2277] Python classes / courses - what version do we train on? - (2009-07-10)
  [2369] Using a cache for efficiency. Python and PHP examples - (2009-08-21)
  [2462] Python - how it saves on compile time - (2009-10-20)
  [4088] Some tips and techniques for huge data handling in Python - (2013-05-15)

Y105 - Python - Functions, Modules and Packages
  [96] Variable Scope - (2004-10-22)
  [105] Distance Learning - (2004-10-31)
  [294] Python generator functions, lambdas, and iterators - (2005-04-28)
  [303] Lambdas in Python - (2005-05-06)
  [308] Call by name v call by value - (2005-05-11)
  [340] Code and code maintainance efficiency - (2005-06-08)
  [386] What is a callback? - (2005-07-22)
  [418] Difference between import and from in Python - (2005-08-18)
  [561] Python's Generator functions - (2006-01-11)
  [668] Python - block insets help with documentation - (2006-04-04)
  [745] Python modules. The distribution, The Cheese Shop and the Vaults of Parnassus. - (2006-06-05)
  [749] Cottage industry or production line data handling methods - (2006-06-07)
  [775] Do not duplicate your code - (2006-06-23)
  [821] Dynamic functions and names - Python - (2006-08-03)
  [900] Python - function v method - (2006-10-20)
  [912] Recursion in Python - (2006-11-02)
  [913] Python - A list of methods - (2006-11-03)
  [949] Sludge off the mountain, and Python and PHP - (2006-11-27)
  [959] It's the 1st, not the 1nd 1rd or 1th. - (2006-12-01)
  [1134] Function / method parameters with * and ** in Python - (2007-04-04)
  [1163] A better alternative to cutting and pasting code - (2007-04-26)
  [1202] Returning multiple values from a function (Perl, PHP, Python) - (2007-05-24)
  [1464] Python Script - easy examples of lots of basics - (2007-12-08)
  [1784] Global - Tcl, PHP, Python - (2008-09-03)
  [1790] Sharing variables with functions, but keeping them local too - Python - (2008-09-09)
  [1869] Anonymous functions (lambdas) and map in Python - (2008-11-04)
  [1870] What to do with a huge crop of apples - (2008-11-04)
  [1871] Optional and named parameters in Python - (2008-11-05)
  [1879] Dynamic code - Python - (2008-11-11)
  [2011] Conversion of OSI grid references to Eastings and Northings - (2009-01-28)
  [2439] Multiple returns from a function in Python - (2009-10-06)
  [2440] Optional parameters to Python functions - (2009-10-07)
  [2481] Sample code with errors in it on our web site - (2009-10-29)
  [2506] Good example of recursion in Python - analyse an RSS feed - (2009-11-18)
  [2520] Global and Enable - two misused words! - (2009-11-30)
  [2718] Python - access to variables in the outer scope - (2010-04-12)
  [2766] Optional and named parameters to Python functions/methods - (2010-05-15)
  [2878] Program for reliability and efficiency - do not duplicate, but rather share and re-use - (2010-07-19)
  [2929] Passing a variable number of parameters in to a function / method - (2010-08-20)
  [2994] Python - some common questions answered in code examples - (2010-10-10)
  [2998] Using an exception to initialise a static variable in a Python function / method - (2010-10-13)
  [3159] Returning multiple values from a function call in various languages - a comparison - (2011-02-06)
  [3280] Passing parameters to Python functions - the options you have - (2011-05-07)
  [3459] Catching the fishes first? - (2011-09-27)
  [3464] Passing optional and named parameters to python methods - (2011-10-04)
  [3472] Static variables in functions - and better ways using objects - (2011-10-10)
  [3474] Python Packages - groupings of modules. An introduction - (2011-10-11)
  [3662] Finding all the unique lines in a file, using Python or Perl - (2012-03-20)
  [3695] Functions are first class variables in Lua and Python - (2012-04-13)
  [3852] Static variables in Python? - (2012-08-29)
  [3885] Default local - a good choice by the author of Python - (2012-10-08)
  [3931] Optional positional and named parameters in Python - (2012-11-23)
  [3945] vargs in Python - how to call a method with unknown number of parameters - (2012-12-06)
  [4029] Exception, Lambda, Generator, Slice, Dict - examples in one Python program - (2013-03-04)
  [4161] Python varables - checking existance, and call by name or by value? - (2013-08-27)
  [4212] Python functions - an introduction to how they work - (2013-11-16)
  [4361] Multiple yields and no loops in a Python generator? - (2014-12-22)
  [4407] Python - even named code blocks are objects - (2015-01-28)
  [4410] A good example of recursion - a real use in Python - (2015-02-01)
  [4441] Reading command line parameters in Python - (2015-02-23)
  [4448] What is the difference between a function and a method? - (2015-03-04)
  [4645] What are callbacks? Why use them? An example in Python - (2016-02-11)
  [4662] Recursion in Python - the classic example - (2016-03-07)
  [4719] Nesting decorators - (2016-11-02)
  [4722] Embedding more complex code into a named block - (2016-11-04)
  [4724] From and Import in Python - where is the module loaded from? - (2016-11-06)


Back to
Christmas in June? Melksham hotel bookings and Santa train
Previous and next
or
Horse's mouth home
Forward to
How well do you know Melksham?
Some other Articles
Sample answers to training course exercises - available on our web site
Muttable v immutable and implications - Ruby
Melksham Chamber of Commerce - looking to our future shape. Pivotal meeting next Tuesday
How well do you know Melksham?
Python timing - when to use a list, and when to use a generator
Christmas in June? Melksham hotel bookings and Santa train
Shell, Awk, Perl of Python?
Spike solutions and refactoring - a Python example
Learning to program - the if statement. Python.
Melksham - placed 2254 out of 2255. What can be done about it?
4759 posts, page by page
Link to page ... 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96 at 50 posts per page


This is a page archived from The Horse's Mouth at http://www.wellho.net/horse/ - the diary and writings of Graham Ellis. Every attempt was made to provide current information at the time the page was written, but things do move forward in our business - new software releases, price changes, new techniques. Please check back via our main site for current courses, prices, versions, etc - any mention of a price in "The Horse's Mouth" cannot be taken as an offer to supply at that price.

Link to Ezine home page (for reading).
Link to Blogging home page (to add comments).

© WELL HOUSE CONSULTANTS LTD., 2021: 48 Spa Road • Melksham, Wiltshire • United Kingdom • SN12 7NY
PH: 01144 1225 708225 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho

PAGE: http://www.wellho.net/mouth/3766_Pyt ... rator.html • PAGE BUILT: Sun Oct 11 16:07:41 2020 • BUILD SYSTEM: JelliaJamb