Home Accessibility Courses Diary The Mouth Forum Resources Site Map About Us Contact

Related technical and longer articles
Data Monging
Tips and short articles on this subject
Preventing ^C stopping / killing a program - Perl
Progress Bar Techniques - Perl
Perl - progress bar, supressing ^C and coping with huge data flows
Answering ALL the delegate's Perl questions
Huge data files - what happened earlier?
Progress bars and other dynamic reports
Well House Consultants
You are on the site of Well House Consultants who provide Open Source Training Courses and business hotel accommodation. You are welcome to browse and use our resources subject to our copyright statement and to add in links from your pages to ours.
Other subject areas - resources
Java Resources
Well House Manor Resources
Perl Resources
Python Resources
PHP Resources
Object Orientation and General topics
MySQL Resources
Linux / LAMP / Tomcat Resources
Well House Consultants Resources
Extras Resources
C and C++ Resources
Ruby Resources
Tcl/Tk Resources
Web and Intranet Resources
Perl module P667
Handling Huge Data
Exercises, examples and other material relating to training module P667. This topic is presented on public course Perl for Larger Projects

Perl for Larger Projects - Objects, huge data, SQL databases, XML, efficiency and other topics. This advanced course takes the Perl programmer through ...
http://www.wellho.net/course/plfull.html  [course]
During courses, questions arise. "I'll get back to that" could make people feel that I'm brushing something off ... except that I explain, early on, ...
http://www.wellho.net/mouth/975_Answ ... tions.html  [short article]
div class= introsubheads PROCESSING LARGE QUANTITIES OF DATA BR /div Data Monging is a term that has come to be used for processing quantities of ...
http://www.wellho.net/solutions/perl-data-monging.html  [longer article]
Searching for warrior These are complete results Searched through 3904403 sites, matched 794 started at Fri Sep 19 16:07:18 2003 url: http://www.boxofficemojo.com/13thwarrior.html ...
http://www.wellho.net/resources/ex.php4?item=p667/out.txt  [code sample]
Have you ever sat there and wondered "is this program nearly done ... is it still running ... how is it getting on" and wished you had a progress bar. ...
http://www.wellho.net/mouth/1920_Pro ... -Perl.html  [short article]
If you're handling a huge amount of data (gigabytes!) in a Perl program, memory won't allow you to slurp it all into a list and you'll traverse the data ...
http://www.wellho.net/mouth/1397_Per ... flows.html  [short article]
When I'm programming a log file analysis in Perl, I'll often "slurp" the whole file into a list which I can then traverse efficiently as many times as ...
http://www.wellho.net/mouth/762_Huge ... lier-.html  [short article]

If you've so much data that it won't all fit into memory all at once, you may not be able to use conventional programming techniques to complete your task. We define a data set such as this as "huge data"; it's impossible to handle in some languages, but very practical in Perl. This module doesn't introduce many new language features; instead, it shows you how to use what you already know to handle huge data practically.

This topic is presented on public course Perl for Larger Projects

Examples from our training material
behind   looking behind in huge data files
huge1   A program to test handling a small part of a huge data set
huge2   Providing user feedback while handling huge data
huge3   Asking a long running application for intermediate reports
huge3.pid   Example of the huge.pid file
hugehunter   Long log file analysis, with progress and intermediate reporting
makedirs   Preprocessing a huge data file to set up indexes
makeindex   Generating a list of markers to a huge sorted data set
opt2   Sorting and data filtering efficiency
opt3   Improving sort efficiency
opt4   Improving sort efficient further - caching record analysis
optim   Optimising code to avoid repeating calculations
out.txt   Example of search results written to file
paws   Progress Bar Techniques
readtime   Efficiency - reading a file in large blocks
reg_opt   Regular expression match - inefficient example
reg_opt1   Regular expression match - don't save $' $` and $&
reg_opt2   Regular expression match - use of "o" modifier
reg_opt3   Regular expression match - more specific and faster
reg_opt4   Regular expression match - a start assertion speeds it up!
rt2   Handling data in chunks - chunk overlap issue solved
site.pm   Class used in other examples in this module
useindex   Grab first ten sites on a topic area - QUICKLY via index
Pictures
A happy trainee
Background information
Some modules are available for download as a sample of our material or under an Open Training Notes License for free download from http://www.training-notes.co.uk.
Topics covered in this module
Planning.
General techniques for large and huge data.
Code Optimization.
Regular Expressions.
Sorting.
Large Data.
Avoid loops.
Store data in memory.
Huge Data.
Hello HUGE world.
User feedback.
Controlling a long-running process.
Reading the data.
Arranging and storing the data.
Using a directory structure.
Indexing.
For Reference.
Complete learning
If you are looking for a complete course and not just a information on a single subject, visit our Listing and schedule page.

Well House Consultants specialise in training courses in Python, Perl, PHP, and MySQL. We run Private Courses throughout the UK (and beyond for longer courses), and Public Courses at our training centre in Melksham, Wiltshire, England. It's surprisingly cost effective to come on our public courses - even if you live in a different country or continent to us.

We have a technical library of over 700 books on the subjects on which we teach. These books are available for reference at our training centre. Also available is the Opentalk Forum for discussion of technical questions.


© WELL HOUSE CONSULTANTS LTD., 2009: Well House Manor • 48 Spa Road • Melksham, Wiltshire • United Kingdom • SN12 7NY
PH: 01144 1225 708225 • FAX: 01144 1225 707126 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho