Home Accessibility Courses Twitter The Mouth Facebook Resources Site Map About Us Contact
 
Python and Tcl - public course schedule [here]
Private courses on your site - see [here]
Please ask about maintenance training for Perl, PHP, Lua, etc
 
Reporting on the 10 largest files or 10 top scores

What are the biggest 10 files in or below this directory?

What are the 20 'worst' spams I have received in the last month?

What are the five top scores recorded for a popular game on my web site?

It's a very common requirement indeed to provide a program to answer questions like these, and if you've only got a handful of files /spams / score records, it's easy to write a program to read them all into an array (PHP) or list (Perl, Python), sort that array or list when you've read them all, and print out the first however-many. But that approach becomes impractically slow and memory greedy if you have a big log file ... as for example the quarter of a million records I have in my spam record file at the moment.

Here's the technique you can use to find the top 20 records from several million in a log file - quickly, and efficiently ....

1) Set up an empty list to contain the top 20 (so far) as you discover them.

2) pass through the record one by one ...
2a) Work out the comparsion factor (size, score) for the record just read
2b) If you have already read and stored 20 records, and the new record is below the 20th one stored, reject it OTHERWISE ...
2c) Step through the records retained so far and insert the new one at the appropriate place in the list
2d) If the list now contains more that 20 records, truncate it to 20

3) Print out your results from the list you now have.

You can see this algorithm implemented in PHP here and you can run it here. It's not the simplest of code, but it should aways work no matter how large or how small the cutoff between the 20th and 21st record is (as opposed to alternative algorithms that set a threshhold), and it should always work quite fast even on a data set that's pretty huge; most of the data will be rejected summarily and won't need to be stored at all.

You might suggest that my data should be stored in a MySQL database and not a plain text file ... that's not the problem I was given, and is worthy of an entry here another day!
(written 2006-08-20, updated 2006-08-19)

 
Associated topics are indexed as below, or enter http://melksh.am/nnnn for individual articles
H115 - Designing PHP-Based Solutions: Best Practice
  [4691] Real life PHP application using our course training MVC example - (2016-06-05)
  [4641] Using an MVC structure - even without a formal framework - (2016-02-07)
  [4326] Learning to program - comments, documentation and test code - (2014-11-22)
  [4118] We not only teach PHP and Python - we teach good PHP and Python Practice! - (2013-06-18)
  [4069] Even early on, separate out your program from your HTML! - (2013-04-25)
  [3926] Filtering PHP form inputs - three ways, but which should you use? - (2012-11-18)
  [3820] PHP sessions - a best practice teaching example - (2012-07-27)
  [3813] Injection Attacks - PHP, SQL, HTML, Javascript - and how to neutralise them - (2012-07-22)
  [3539] Separating program and artwork in PHP - easier maintainance, and better for the user - (2011-12-05)
  [2679] How to build a test harness into your PHP - (2010-03-16)
  [2430] Not just a PHP program - a good web application - (2009-09-29)
  [2221] Adding a newsfeed for your users to a multipage PHP application - (2009-06-06)
  [2199] Improving the structure of your early PHP programs - (2009-05-25)
  [1794] Refactoring - a PHP demo becomes a production page - (2008-09-12)
  [1694] Defensive coding techniques in PHP? - (2008-07-02)
  [1623] PHP Techniques - a workshop - (2008-04-26)
  [1533] Short and sweet and sticky - PHP form input - (2008-02-06)
  [1490] Software to record day to day events and keep an action list - (2007-12-31)
  [1487] Efficient PHP applications - framework and example - (2007-12-28)
  [1482] A story about benchmarking PHP - (2007-12-23)
  [1391] Ordnance Survey Grid Reference to Latitude / Longitude - (2007-10-14)
  [1390] Converting from postal address to latitude / longitude - (2007-10-13)
  [1389] Controlling and labelling Google maps via PHP - (2007-10-13)
  [1381] Using a MySQL database to control mod_rewrite via PHP - (2007-10-06)
  [1323] Easy handling of errors in PHP - (2007-08-27)
  [1321] Resetting session based tests in PHP - (2007-08-26)
  [1194] Drawing hands on a clock face - PHP - (2007-05-19)
  [1182] Painting a masterpiece in PHP - (2007-05-10)
  [1181] Good Programming practise - where to initialise variables - (2007-05-09)
  [1166] Back button - ensuring order are not submitted twice (PHP) - (2007-04-28)
  [1052] Learning to write secure, maintainable PHP - (2007-01-25)
  [1047] Maintainable code - some positive advice - (2007-01-21)
  [945] Code quality counts - (2006-11-26)
  [936] Global, Superglobal, Session variables - scope and persistance in PHP - (2006-11-21)
  [896] PHP - good coding practise and sticky radio buttons - (2006-10-17)
  [572] Giving the researcher power over database analysis - (2006-01-22)
  [563] Merging pictures using PHP and GD - (2006-01-13)
  [426] Robust checking of data entered by users - (2005-08-27)
  [394] A year on - should we offer certified PHP courses - (2005-07-28)
  [340] Code and code maintainance efficiency - (2005-06-08)
  [261] Putting a form online - (2005-03-29)
  [237] Crossfertilisation, PHP to Python - (2005-03-06)
  [123] Short underground journeys and a PHP book - (2004-11-19)

H999 - Additional PHP Material
  [4655] Image indexer / thumbnail display scripts in PHP - (2016-02-25)
  [3210] Catchable fatal error in PHP ... How to catch, and alternative solutions such as JSON - (2011-03-22)
  [3118] Arrays of arrays - or 2D arrays. How to program tables. - (2011-01-02)
  [2684] Exception handling in PHP - (2010-03-18)
  [2215] If nothing, make it nothing. - (2009-06-02)
  [2073] Extra PHP Examples - (2009-03-09)
  [1519] Flipping images on your web page - (2008-01-26)
  [1505] Script to present commonly used images - PHP - (2008-01-13)
  [1485] Copyright and theft of images, bandwidth and members. - (2007-12-26)
  [1451] More PHP sample and demonstration programs - (2007-12-01)
  [1270] PHP Standalone - keyboard to screen - (2007-07-18)
  [1104] Drawing dynamic graphs in PHP - (2007-03-09)
  [1053] Sorting people by name in PHP - (2007-01-26)
  [1020] Parallel processing in PHP - (2007-01-03)
  [1010] Dates, times, clickable diarys in PHP - (2006-12-28)
  [937] Display an image from a MySQL database in a web page via PHP - (2006-11-22)
  [917] Syntax checking in PHP - (2006-11-07)
  [822] PHP - a team member leaves - (2006-08-04)
  [806] Check your user is human. Have him retype a word in a graphic - (2006-07-17)
  [789] Hot answers in PHP - (2006-07-02)
  [687] Presentation, Business and Persistence layers in Perl and PHP - (2006-04-17)
  [665] PHP Image viewing application - (2006-04-01)
  [603] PHP - setting sort order with an associative array - (2006-02-13)
  [493] Running a Perl script within a PHP page - (2005-11-12)
  [483] Double Dollars in PHP - (2005-11-02)
  [468] Stand alone PHP programs - (2005-10-18)
  [372] Time calculation in PHP - (2005-07-08)
  [337] the array returned by preg_match_all - (2005-06-06)
  [322] More maps - (2005-05-23)
  [320] Ordnance Survey - using a 'Get a map' - (2005-05-22)
  [239] What and why for the epoch - (2005-03-08)
  [54] PHP and natural sorting - (2004-09-19)

P602 - Perl - Advanced File and Directory Handling
  [3429] Searching through all the files in or below a directory - Ruby, Tcl, Perl - (2011-09-09)
  [3412] Handling binary data in Perl is easy! - (2011-08-30)
  [2876] Different perl examples - some corners I rarely explore - (2010-07-18)
  [1861] Reactive (dynamic) formatting in Perl - (2008-10-31)
  [1832] Processing all files in a directory - Perl - (2008-10-11)
  [1709] There is more that one way - Perl - (2008-07-14)
  [1225] Perl - functions for directory handling - (2007-06-09)
  [975] Answering ALL the delegate's Perl questions - (2006-12-09)


Back to
Talking about other training companies.
Previous and next
or
Horse's mouth home
Forward to
Computers, Brides and Cream Teas
Some other Articles
To join an organisation?
Dramatic Skys at Longleat
Forum help - a push in the right direction
Computers, Brides and Cream Teas
Reporting on the 10 largest files or 10 top scores
Talking about other training companies.
Tomcat - Shutdown port
Build on what you already have with OO
Python - when to use the in operator
Python makes University Challenge
4759 posts, page by page
Link to page ... 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96 at 50 posts per page


This is a page archived from The Horse's Mouth at http://www.wellho.net/horse/ - the diary and writings of Graham Ellis. Every attempt was made to provide current information at the time the page was written, but things do move forward in our business - new software releases, price changes, new techniques. Please check back via our main site for current courses, prices, versions, etc - any mention of a price in "The Horse's Mouth" cannot be taken as an offer to supply at that price.

Link to Ezine home page (for reading).
Link to Blogging home page (to add comments).

You can Add a comment or ranking to this page

© WELL HOUSE CONSULTANTS LTD., 2019: 404 The Spa • Melksham, Wiltshire • United Kingdom • SN12 6QL
PH: 01225 708225 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho

PAGE: http://www.wellho.net/mouth/839_Repo ... cores.html • PAGE BUILT: Sat May 27 16:49:10 2017 • BUILD SYSTEM: WomanWithCat