Home Accessibility Courses Twitter The Mouth Facebook Resources Site Map About Us Contact
 
For 2021 - online Python 3 training - see ((here)).

Our plans were to retire in summer 2020 and see the world, but Coronavirus has lead us into a lot of lockdown programming in Python 3 and PHP 7.
We can now offer tailored online training - small groups, real tutors - works really well for groups of 4 to 14 delegates. Anywhere in the world; course language English.

Please ask about private 'maintenance' training for Python 2, Tcl, Perl, PHP, Lua, etc.
Reporting on the 10 largest files or 10 top scores

What are the biggest 10 files in or below this directory?

What are the 20 'worst' spams I have received in the last month?

What are the five top scores recorded for a popular game on my web site?

It's a very common requirement indeed to provide a program to answer questions like these, and if you've only got a handful of files /spams / score records, it's easy to write a program to read them all into an array (PHP) or list (Perl, Python), sort that array or list when you've read them all, and print out the first however-many. But that approach becomes impractically slow and memory greedy if you have a big log file ... as for example the quarter of a million records I have in my spam record file at the moment.

Here's the technique you can use to find the top 20 records from several million in a log file - quickly, and efficiently ....

1) Set up an empty list to contain the top 20 (so far) as you discover them.

2) pass through the record one by one ...
2a) Work out the comparsion factor (size, score) for the record just read
2b) If you have already read and stored 20 records, and the new record is below the 20th one stored, reject it OTHERWISE ...
2c) Step through the records retained so far and insert the new one at the appropriate place in the list
2d) If the list now contains more that 20 records, truncate it to 20

3) Print out your results from the list you now have.

You can see this algorithm implemented in PHP here and you can run it here. It's not the simplest of code, but it should aways work no matter how large or how small the cutoff between the 20th and 21st record is (as opposed to alternative algorithms that set a threshhold), and it should always work quite fast even on a data set that's pretty huge; most of the data will be rejected summarily and won't need to be stored at all.

You might suggest that my data should be stored in a MySQL database and not a plain text file ... that's not the problem I was given, and is worthy of an entry here another day!
(written 2006-08-20, updated 2006-08-19)

 
Associated topics are indexed as below, or enter http://melksh.am/nnnn for individual articles
P602 - Perl - Advanced File and Directory Handling
  [975] Answering ALL the delegate's Perl questions - (2006-12-09)
  [1225] Perl - functions for directory handling - (2007-06-09)
  [1709] There is more that one way - Perl - (2008-07-14)
  [1832] Processing all files in a directory - Perl - (2008-10-11)
  [1861] Reactive (dynamic) formatting in Perl - (2008-10-31)
  [2876] Different perl examples - some corners I rarely explore - (2010-07-18)
  [3412] Handling binary data in Perl is easy! - (2011-08-30)
  [3429] Searching through all the files in or below a directory - Ruby, Tcl, Perl - (2011-09-09)

H999 - Additional PHP Material
  [54] PHP and natural sorting - (2004-09-19)
  [239] What and why for the epoch - (2005-03-08)
  [320] Ordnance Survey - using a 'Get a map' - (2005-05-22)
  [322] More maps - (2005-05-23)
  [337] the array returned by preg_match_all - (2005-06-06)
  [372] Time calculation in PHP - (2005-07-08)
  [468] Stand alone PHP programs - (2005-10-18)
  [483] Double Dollars in PHP - (2005-11-02)
  [493] Running a Perl script within a PHP page - (2005-11-12)
  [563] Merging pictures using PHP and GD - (2006-01-13)
  [603] PHP - setting sort order with an associative array - (2006-02-13)
  [665] PHP Image viewing application - (2006-04-01)
  [687] Presentation, Business and Persistence layers in Perl and PHP - (2006-04-17)
  [789] Hot answers in PHP - (2006-07-02)
  [806] Check your user is human. Have him retype a word in a graphic - (2006-07-17)
  [822] PHP - a team member leaves - (2006-08-04)
  [917] Syntax checking in PHP - (2006-11-07)
  [937] Display an image from a MySQL database in a web page via PHP - (2006-11-22)
  [1010] Dates, times, clickable diarys in PHP - (2006-12-28)
  [1020] Parallel processing in PHP - (2007-01-03)
  [1053] Sorting people by name in PHP - (2007-01-26)
  [1104] Drawing dynamic graphs in PHP - (2007-03-09)
  [1194] Drawing hands on a clock face - PHP - (2007-05-19)
  [1270] PHP Standalone - keyboard to screen - (2007-07-18)
  [1389] Controlling and labelling Google maps via PHP - (2007-10-13)
  [1390] Converting from postal address to latitude / longitude - (2007-10-13)
  [1391] Ordnance Survey Grid Reference to Latitude / Longitude - (2007-10-14)
  [1451] More PHP sample and demonstration programs - (2007-12-01)
  [1485] Copyright and theft of images, bandwidth and members. - (2007-12-26)
  [1505] Script to present commonly used images - PHP - (2008-01-13)
  [1519] Flipping images on your web page - (2008-01-26)
  [1623] PHP Techniques - a workshop - (2008-04-26)
  [2073] Extra PHP Examples - (2009-03-09)
  [2215] If nothing, make it nothing. - (2009-06-02)
  [2684] Exception handling in PHP - (2010-03-18)
  [3118] Arrays of arrays - or 2D arrays. How to program tables. - (2011-01-02)
  [3210] Catchable fatal error in PHP ... How to catch, and alternative solutions such as JSON - (2011-03-22)
  [4655] Image indexer / thumbnail display scripts in PHP - (2016-02-25)

H115 - Designing PHP-Based Solutions: Best Practice
  [123] Short underground journeys and a PHP book - (2004-11-19)
  [237] Crossfertilisation, PHP to Python - (2005-03-06)
  [261] Putting a form online - (2005-03-29)
  [340] Code and code maintainance efficiency - (2005-06-08)
  [394] A year on - should we offer certified PHP courses - (2005-07-28)
  [426] Robust checking of data entered by users - (2005-08-27)
  [572] Giving the researcher power over database analysis - (2006-01-22)
  [896] PHP - good coding practise and sticky radio buttons - (2006-10-17)
  [936] Global, Superglobal, Session variables - scope and persistance in PHP - (2006-11-21)
  [945] Code quality counts - (2006-11-26)
  [1047] Maintainable code - some positive advice - (2007-01-21)
  [1052] Learning to write secure, maintainable PHP - (2007-01-25)
  [1166] Back button - ensuring order are not submitted twice (PHP) - (2007-04-28)
  [1181] Good Programming practise - where to initialise variables - (2007-05-09)
  [1182] Painting a masterpiece in PHP - (2007-05-10)
  [1321] Resetting session based tests in PHP - (2007-08-26)
  [1323] Easy handling of errors in PHP - (2007-08-27)
  [1381] Using a MySQL database to control mod_rewrite via PHP - (2007-10-06)
  [1482] A story about benchmarking PHP - (2007-12-23)
  [1487] Efficient PHP applications - framework and example - (2007-12-28)
  [1490] Software to record day to day events and keep an action list - (2007-12-31)
  [1533] Short and sweet and sticky - PHP form input - (2008-02-06)
  [1694] Defensive coding techniques in PHP? - (2008-07-02)
  [1794] Refactoring - a PHP demo becomes a production page - (2008-09-12)
  [2199] Improving the structure of your early PHP programs - (2009-05-25)
  [2221] Adding a newsfeed for your users to a multipage PHP application - (2009-06-06)
  [2430] Not just a PHP program - a good web application - (2009-09-29)
  [2679] How to build a test harness into your PHP - (2010-03-16)
  [3539] Separating program and artwork in PHP - easier maintainance, and better for the user - (2011-12-05)
  [3813] Injection Attacks - PHP, SQL, HTML, Javascript - and how to neutralise them - (2012-07-22)
  [3820] PHP sessions - a best practice teaching example - (2012-07-27)
  [3926] Filtering PHP form inputs - three ways, but which should you use? - (2012-11-18)
  [4069] Even early on, separate out your program from your HTML! - (2013-04-25)
  [4118] We not only teach PHP and Python - we teach good PHP and Python Practice! - (2013-06-18)
  [4326] Learning to program - comments, documentation and test code - (2014-11-22)
  [4641] Using an MVC structure - even without a formal framework - (2016-02-07)
  [4691] Real life PHP application using our course training MVC example - (2016-06-05)


Back to
Talking about other training companies.
Previous and next
or
Horse's mouth home
Forward to
Computers, Brides and Cream Teas
Some other Articles
To join an organisation?
Dramatic Skys at Longleat
Forum help - a push in the right direction
Computers, Brides and Cream Teas
Reporting on the 10 largest files or 10 top scores
Talking about other training companies.
Tomcat - Shutdown port
Build on what you already have with OO
Python - when to use the in operator
Python makes University Challenge
4759 posts, page by page
Link to page ... 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96 at 50 posts per page


This is a page archived from The Horse's Mouth at http://www.wellho.net/horse/ - the diary and writings of Graham Ellis. Every attempt was made to provide current information at the time the page was written, but things do move forward in our business - new software releases, price changes, new techniques. Please check back via our main site for current courses, prices, versions, etc - any mention of a price in "The Horse's Mouth" cannot be taken as an offer to supply at that price.

Link to Ezine home page (for reading).
Link to Blogging home page (to add comments).

You can Add a comment or ranking to this page

© WELL HOUSE CONSULTANTS LTD., 2021: 48 Spa Road • Melksham, Wiltshire • United Kingdom • SN12 7NY
PH: 01144 1225 708225 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho

PAGE: http://www.wellho.net/mouth/839_Repo ... cores.html • PAGE BUILT: Sun Oct 11 16:07:41 2020 • BUILD SYSTEM: JelliaJamb