Training, Open Source computer languages
PerlPHPPythonMySQLApache / TomcatTclRubyJavaC and C++LinuxCSS 
Search for:
Home Accessibility Courses Diary The Mouth Forum Resources Site Map About Us Contact
Huge files in Python - over 4 Gbytes

Posted by admin (Graham Ellis), 19 August 2004
Looking back just a few years, a file in excess of 4 Gbytes was unthinkable and files (or even) file systems were limited to 2^32 (2 to the power 32) bytes.  These days, though, a file in excess of 4 Gb is perfectly possible on most (but not all) file systems and can be handled by most (but not all) languages.

In the last couple of days, I was asked about huge files in Python - rumours of problems were reported - and I wrote the following and tested it just fine to extract every millionth line from a 6.9 Gb file.

Code:
#/usr/bin/python

huge = open("huge.txt")
count = 0

for line in huge.xreadlines():
   count += 1
   if not (count % 1000000):
                print str(count)+" "+line


Note - any construct that reads the whole of the file into memory at one do is going to fail ... that's why I chose xreadlines.



This page is a thread posted to the opentalk forum at www.opentalk.org.uk and archived here for reference. To jump to the archive index please follow this link.

You can Add a comment or ranking to this page

© WELL HOUSE CONSULTANTS LTD., 2014: Well House Manor • 48 Spa Road • Melksham, Wiltshire • United Kingdom • SN12 7NY
PH: 01144 1225 708225 • FAX: 01144 1225 899360 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho