Detecting if file is Macintosh, Windows or Unix

Posted by airborne (airborne), 29 July 2005

I have a problem with text files that are not Windows files. Users create text files and drop them on a fileserver. My script opens this files and loops through the contents. Problem is that if the file was created from a Macintosh or Unix system, my scripts doesn't see the EOL character because windows uses the CR/LF character where Macintosh uses CR and Unix uses LF.

Is there a way to detect which of the three OS's created the file and be able to split the lines based on this OS?

Hope this makes sense.

Angel

Posted by John_Moylan (jfp), 29 July 2005

You should not have to tell which os the source came from. I would have thought the only thing you needed to do was convert the line endings.

You can do this with the substitute function

Code:

's/\r\n|\r/\n/g'

\r\n is window
\r is mac
\n is unix

So look for \r\n OR \r and replace with \n. All line endings are now unix.

This does depend on you slurping the file into one string though, that may be a problem if the file is really large.

the unix command line can also help here:
read the manpage on:
dos2unix
mac2unix

If you must know the source OS then perhaps you could see which line ending exists in the file, of the top of my head and probably wrong but I would use tr for this.

Code:

tr/\r\n/\r\n/

This will replace the \r\n line ending with \r\n, makes no sense replacing like with like, but the return value of tr is the number of replacements. So loop through the possible line endings translating them until the return value of tr is positive, then 'last' out of the loop knowing which was the successful 'tr'

God I hope that makes sense.

Posted by airborne (airborne), 29 July 2005

Thanx for the reply however...

Not the Macintosh files work but I have broken the Unix and PC files.

I think I will have to try to identify the type of file first, somehow, then perform the translations where needed.

Is anyone aware of a "special character like $\' that tells the parser what the EOL character is?

Angel

Posted by John_Moylan (jfp), 29 July 2005

I still don't think its essential to recognise the uploaded files source OS, you just need to make sure the EOL is correct for the OS you processing that file on.
Which OS will your Perl programme reside on?

Also have you tried CPAN ?

I found this and it may be of some help in making you line endings consistent if you've mis-munged them.
http://search.cpan.org/~autrijus/PerlIO-eol-0.13/eol.pm

Posted by Custard (Custard), 29 July 2005

on 07/29/05 at 21:03:31, airborne wrote:

Is anyone aware of a "special character like $\' that tells the parser what the EOL character is?

Angel

Does $/ do what you want?

According to my very swift glance at my perl crib sheet

$/ Input record separator, newline by default, May be multi character.

HTH

B

Posted by John_Moylan (jfp), 29 July 2005

Hello Custard, hope your well.

indeed slurping the file would solve the EOF problem, airborne: ensure the file is not large as it can consume a fair chunk of memory.

The use of $/ has been discussed within a thread here before. See here:

http://www.wellho.net/cgi-bin/opentalk/YaBB.pl?board=perl;action=display;num=1101119309;start=2

There is no special variable that I'm aware of that can give you the OS that created a file, though I bow to those here with greater knowledge.

Posted by admin (Graham Ellis), 30 July 2005

Here's the sort of code I use personally. It doesn't matter what the input format is ... the file is always translated. It workd by forcing the incoming file to one standard, then translating it through to what you want - much easier that 9 possible direct translation modes.

Code:

#!/usr/bin/perl

open (FH,$ARGV[0]);
read (FH,$buffer,-s $ARGV[0]);

# Unixify ...

$buffer =~ s/\r\n/\n/g;
$buffer =~ s/\r/\n/g;

# Uncomment following lines to set up output

# $buffer =~ s/\n/\r\n/g; # for Windows
# $buffer =~ s/\n\r/\r/g; # for old Mac style

open (FHO,">$ARGV[1]");
print FHO $buffer;
close FHO;

I don't think there's any special variable or piece of authority that tells you what incoming OS is, but if you count the nmber of changes made by the substitutes at top, that should give the game away.

Cautions - do NOT try to translate binary files (e.g. .gif and .jpg) as you'll corrupt them. And the above program won't work with huge files - i.e. ones that you can't load into memory all at the same time.

I suspect my answer doesn't provide much new - just pulls other answer together. Ah well ... the more the merrier!!

Posted by airborne (airborne), 1 August 2005

Graham,

Your snippet of code worked for me. I simply read the file and converted it. I then split it into an array rather than writing it back to a file.

I think I have found a home for Perl help.

Thanx for your help.
Angel

This page is a thread posted to the opentalk forum at www.opentalk.org.uk and archived here for reference. To jump to the archive index please follow this link.