Detecting if file is Macintosh, Windows or Unix
Posted by airborne (airborne), 29 July 2005I have a problem with text files that are not Windows files. Users create text files and drop them on a fileserver. My script opens this files and loops through the contents. Problem is that if the file was created from a Macintosh or Unix system, my scripts doesn't see the EOL character because windows uses the CR/LF character where Macintosh uses CR and Unix uses LF.
Is there a way to detect which of the three OS's created the file and be able to split the lines based on this OS?
Hope this makes sense.
Posted by John_Moylan (jfp), 29 July 2005You should not have to tell which os the source came from. I would have thought the only thing you needed to do was convert the line endings.
You can do this with the substitute function
\r\n is window
\r is mac
\n is unix
So look for \r\n OR \r and replace with \n. All line endings are now unix.
This does depend on you slurping the file into one string though, that may be a problem if the file is really large.
the unix command line can also help here:
read the manpage on:
If you must know the source OS then perhaps you could see which line ending exists in the file, of the top of my head and probably wrong but I would use tr for this.
God I hope that makes sense.
Posted by airborne (airborne), 29 July 2005Thanx for the reply however...
Not the Macintosh files work but I have broken the Unix and PC files.
I think I will have to try to identify the type of file first, somehow, then perform the translations where needed.
Is anyone aware of a "special character like $\' that tells the parser what the EOL character is?
Posted by John_Moylan (jfp), 29 July 2005I still don't think its essential to recognise the uploaded files source OS, you just need to make sure the EOL is correct for the OS you processing that file on.
Which OS will your Perl programme reside on?
Also have you tried CPAN ?
I found this and it may be of some help in making you line endings consistent if you've mis-munged them.
Posted by Custard (Custard), 29 July 2005on 07/29/05 at 21:03:31, airborne wrote:
Does $/ do what you want?
According to my very swift glance at my perl crib sheet
$/ Input record separator, newline by default, May be multi character.
Posted by John_Moylan (jfp), 29 July 2005Hello Custard, hope your well.
indeed slurping the file would solve the EOF problem, airborne: ensure the file is not large as it can consume a fair chunk of memory.
The use of $/ has been discussed within a thread here before. See here:
There is no special variable that I'm aware of that can give you the OS that created a file, though I bow to those here with greater knowledge.
Posted by admin (Graham Ellis), 30 July 2005Here's the sort of code I use personally. It doesn't matter what the input format is ... the file is always translated. It workd by forcing the incoming file to one standard, then translating it through to what you want - much easier that 9 possible direct translation modes.
I don't think there's any special variable or piece of authority that tells you what incoming OS is, but if you count the nmber of changes made by the substitutes at top, that should give the game away.
Cautions - do NOT try to translate binary files (e.g. .gif and .jpg) as you'll corrupt them. And the above program won't work with huge files - i.e. ones that you can't load into memory all at the same time.
I suspect my answer doesn't provide much new - just pulls other answer together. Ah well ... the more the merrier!!
Posted by airborne (airborne), 1 August 2005Graham,
Your snippet of code worked for me. I simply read the file and converted it. I then split it into an array rather than writing it back to a file.
I think I have found a home for Perl help.
Thanx for your help.
PH: 01144 1225 708225 • FAX: 01144 1225 899360 • EMAIL: email@example.com • WEB: http://www.wellho.net • SKYPE: wellho