If you're using big data sets in Python, you're probably using the
numpy module - providing you with fast data handlers at C speed of running, and Python coding speed. But how do you load that data in? Numpy also provides a number of data handlers, data setup routines, and also a save and restore capability.
There's a very basic example at
[link] where I've generated a numpy object from text (I could have used a file ...) - each row and column in the incoming text string has been placed into a row or column in the numpy array.
I've added a further example too ...

Our daily server log file comprises about 150,000 access records (so it's 30Mb to 40Mb in size) and I wanted to see how the traffic varies in each hour through the week via a graph. So that means that I needed to go through and find a piece of information from around a million records, spread over around a quarter of a gigabyte of data to get the results shown on the right. Python's quite mpressive even without numpy - that analysis took less than 10 seconds on my laptop, but later I'll be doing the same exercise to average out the data for a whole six months, and the time will start to get serious.
Numpy's
save and
load functions allowed me to dump out my array to a file, and to load it back in again - my 10 seconds drops to less that 1 second if I do this for a week of data (and for six months it would drop me from about four minutes down to 1 second!).
The code to convert my Python list in which I did the counting (that's another numpy extra feature) is:
info = np.asarray(counter)
and the code to save the data to file is:
np.save("logweek.npy",info)
When I came to run the program (again), I simply had it check if the file existed and if it did, I loaded it:
if os.path.exists("logweek.npy"):
info = np.load("logweek.npy")
The complete source code example is
[here] ... note that it also uses
matplotlib - a plotting library that's often used in association with numpy and scipy
If you're looking to save pure Python data, have a look at the Pickle and Marshall modules that are a part of the standard distribution ... or the cPickle module which is implemented in C and much quicker; this latter becomes the standard in Python 3. We have various examples around -
[marshall example] and a
[post on pickling].
(written 2010-10-09)
Associated topics are indexed under
Y118 - Python - numpy, scipy and matplotlib [3554] Learning more about our web site - and learning how to learn about yours - (2011-12-17)
[2997] 3D graphics - web site usage - simple matplotlib and python example - (2010-10-12)
[2993] Arrays v Lists - what is the difference, why use one or the other - (2010-10-10)
[2992] Matplotlib - graphing in Python - teaching examples - (2010-10-10)
[2990] What are numpy and scipy? - (2010-10-09)
Some other Articles
A river in Melksham is not just for boaters.Python - some common questions answered in code examplesLoading and saving data - Python / numpyOddballs in PlymouthNot mugged in London!Memorial to a day in 1999Python dictionaries - reaching to new uses