Home Accessibility Courses Twitter The Mouth Facebook Resources Site Map About Us Contact
Python, Lua and Tcl - public course schedule [here]
Private courses on your site - see [here]
Please ask about maintenance training for Perl, PHP, Java, C, C++, Ruby, MySQL and Linux / Tomcat systems

Well House Consultants
You are on the site of Well House Consultants who provide Open Source Training Courses and business hotel accommodation. You are welcome to browse and use our resources subject to our copyright statement and to add in links from your pages to ours.
Other subject areas - resources
Java Resources
Well House Manor Resources
Perl Resources
Python Resources
PHP Resources
Object Orientation and General topics
MySQL Resources
Linux / LAMP / Tomcat Resources
Well House Consultants Resources
Extras Resources
C and C++ Resources
Ruby Resources
Tcl/Tk Resources
Web and Intranet Resources
Perl module P667
Handling Huge Data
Exercises, examples and other material relating to training module P667. This topic is presented on public course Perl for Larger Projects

If you've so much data that it won't all fit into memory all at once, you may not be able to use conventional programming techniques to complete your task. We define a data set such as this as "huge data"; it's impossible to handle in some languages, but very practical in Perl. This module doesn't introduce many new language features; instead, it shows you how to use what you already know to handle huge data practically.

Related technical and longer articles
Data Monging

Articles and tips on this subjectupdated
3375How to interact with a Perl program while it is processing data
If you have a long running program, how do you monitor its progress? You could use tail -f on an output file ... but there are other options too. a) You can output a progress line; there's an example of this from the recent Perl for Larget Projects course ... [here]. b) You can trap ^C ... and even ...
3374Speeding up your Perl code
On Friday morning - our Perl for Larger Project course - I was looking at coding efficiency / run speed with delegates. As an example, we took a data file from our web server logs - some 23 Mbytes of data, comprising about 121,000 lines, of which 1099 contained the word "melksham" in lower case. The ...
2834Teaching examples in Perl - third and final part
Three part article ... this is part 3. Jump back to part [1] [2] Following on from two earlier posts, here is the final third of the new examples that I wrote during last week's Perl course, and to which I have added extra documentation over the last couple of days. P212 More on Character Strings "Does ...
2805How are you getting on?
Have you ever asked someone to do something for you ... a long task, and you would like a progress report? "How are you getting on?" you'll ask ... and they'll give you an update - "I'm 75% of the way through" they'll say or - perhaps even more helpfully - "I'm nearly there, and I have some good results ...
2806Macho matching - do not do it!
There's something vaguely macho about doing a grand regular expression match to do all your filtering in a single line of code - but being macho may be less than efficient. It may be far better to do two shorter matches, with the first quickly rejecting records which don't need to be handled in detail, ...
2376Long job - progress bar techniques (Perl)
Here's a "Perl for Larger Projects" example --- for use in illustrating the "advanced file and directory handling" and "handling huge data set" modules. Scenario ... I want to go through all the files and directories on a big drive, and find the largest file(s). It will take a while, so I want progress ...
1920Progress Bar Techniques - Perl
Have you ever sat there and wondered "is this program nearly done ... is it still running ... how is it getting on" and wished you had a progress bar. But then have you ever watched a jerky progress bar and felt that it's more fiction than fact? We were discussing these aspects on today's private Perl ...
1924Preventing ^C stopping / killing a program - Perl
Here's a demonstration - in Perl - that shows you how to avoid a ^C (Control C) dropping you straight out of a program. Have you ever accidentally hit ^C in the wrong window and terminated a long-running process just before it finished ... well, by setting $SIG{INT} to the address of a sub you want ...
1397Perl - progress bar, supressing ^C and coping with huge data flows
If you're handling a huge amount of data (gigabytes!) in a Perl program, memory won't allow you to slurp it all into a list and you'll traverse the data with a loop from file or from database. And because of the sheer volume of data, it may take a while to process. During such proessing, you may wish ...
975Answering ALL the delegate's Perl questions
During courses, questions arise. "I'll get back to that" could make people feel that I'm brushing something off ... except that I explain, early on, that some questions require a great deal of background knowledge to be answered sensibly. And I keep a list of topics that I'll be getting back to ...
762Huge data files - what happened earlier?
When I'm programming a log file analysis in Perl, I'll often "slurp" the whole file into a list which I can then traverse efficiently as many times as I need. If I need to look backwards from some interesting event to see what happened in the immediate lead up to it, I can do so simply by looking at ...
639Progress bars and other dynamic reports
If you've got a program that runs for a long time, your users will wish to be kept informed of progress and how much longer there is to go. Now that's not always easy to predict (and I'm sure that most of you have made fun of such forecasts in the past) but its's much much much better than sitting ...
Examples from our training material
behind   looking behind in huge data files
big.start   Finding largest file, with intermediate status reports
huge1   A program to test handling a small part of a huge data set
huge2   Providing user feedback while handling huge data
huge3   Asking a long running application for intermediate reports
huge3.pid   Example of the huge.pid file
hugehunter   Long log file analysis, with progress and intermediate reporting
makedirs   Preprocessing a huge data file to set up indexes
makeindex   Generating a list of markers to a huge sorted data set
mtx   Merging two huge files
opt2   Sorting and data filtering efficiency
opt3   Improving sort efficiency
opt4   Improving sort efficient further - caching record analysis
optim   Optimising code to avoid repeating calculations
out.txt   Example of search results written to file
paws   Progress Bar Techniques
readtime   Efficiency - reading a file in large blocks
reg_opt   Regular expression match - inefficient example
reg_opt1   Regular expression match - don't save $' $` and $&
reg_opt2   Regular expression match - use of "o" modifier
reg_opt3   Regular expression match - more specific and faster
reg_opt4   Regular expression match - a start assertion speeds it up!
rt2   Handling data in chunks - chunk overlap issue solved
site.pm   Class used in other examples in this module
slurp   slurping and sampling
useindex   Grab first ten sites on a topic area - QUICKLY via index
A happy trainee
Background information
Some modules are available for download as a sample of our material or under an Open Training Notes License for free download from [here].
Topics covered in this module
What is a huge amount of data?
General techniques.
Code optimisation.
Regular expressions.
Avoiding loops.
Storing data in memory.
"Hello Huge World".
User feedback.
Signals and tails to monitor and control a long process.
Reading the data by line or by block.
Arranging and storing the data.
Using a directory structure.
Complete learning
If you are looking for a complete course and not just a information on a single subject, visit our Listing and schedule page.

Well House Consultants specialise in training courses in Ruby, Lua, Python, Perl, PHP, and MySQL. We run Private Courses throughout the UK (and beyond for longer courses), and Public Courses at our training centre in Melksham, Wiltshire, England. It's surprisingly cost effective to come on our public courses - even if you live in a different country or continent to us.

We have a technical library of over 700 books on the subjects on which we teach. These books are available for reference at our training centre.

You can Add a comment or ranking to this page

© WELL HOUSE CONSULTANTS LTD., 2018: Well House Manor • 48 Spa Road • Melksham, Wiltshire • United Kingdom • SN12 7NY
PH: 01225 708225 • FAX: 01225 793803 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho

PAGE: http://www.wellho.net/resources/P667.html • PAGE BUILT: Mon Feb 8 18:55:24 2016 • BUILD SYSTEM: WomanWithCat