Emails from 2002 - intial writing of script.  Remember that this is 
from NFS/Share to Net-A-Talk so basically none of what is here needs 
to be done in your case.  However, it does provide background information 
about the file formats and how our appledouble perl script works. 

NOTE: we were/are all TOP posters, so within an email, the top part will
be the reply, the lower part will be the original.  However, since there
are several emails and I listed them older on top, newer email on bottom,
you'll need to watch the dates on things.  


From: Broc Seib <bseib@icd.cc.purdue.edu>
Date: April 30, 2002 10:30:05 EST
To: 
Subject: Re: Net-A-Talk file format and file conversion status 

Tom,

I'm ready to tweek the file-convert script. Do you know the final
format we need?

Broc

	As people may recall from my last update the file format that 
Net-A-Talk will use on the servers and our file conversion was still a 
work in progress.  Well, they are still a work in progress but certain 
decisions and progress has been made so here is the current status on 
our one major loose end Mac side.

Short version
	Will use Net-A-Talk's format of the AppleDouble version 2 spec.  
(not Mac OS X SMB client's format)
	Will continue to work on getting Net-A-Talk to read/write Mac OS X 
SMB client format over summer
	File conversion will be (obviously) to Net-A-Talk's AppleDouble 
version 2
	File conversion will be performed in one large sweep from the 
server side during the break between
		Spring and Maymester

Long version
Net-A-Talk
	Though Steve Holmes has made a lot of progress in the last 2 
weeks,  we still have one last known major hurdle to overcome to get 
Net-A-Talk reading/writing files in the same format that the Mac OS X 
SMB client .  However, if you check your calendar we are out of time.  
Since we really need Steve doing the Sys admin work on Net-A-Talk 
(configuring, installing, testing, etc) we have put the change of 
Net-A-Talk on hold.  We fully intend to continue pursuing this over the 
summer and see if we can get it working before Fall as we would still 
like to use the Mac OS X SMB client format for all the reasons listed in 
earlier notes.  If you are following the mechanics you will realize that 
if  if this work on Net-A-Talk over the summer were to succeed, it would 
require a second file conversion to actually implement.  Most likely 
candidate would be at the break between summer and fall semesters.
	So, for the time being we are installing Net-A-Talk so that it will 
write files in Net-A-Talk's format of AppleDouble.  We are making two 
changes from the defualt configuration of Net-A-Talk.  We are having 
Net-A-Talk save as AppleDouble version 2 instead of version 1.  Both 
NFS/Share and the Mac OS X SMB client use AppleDouble version 2 so this 
will make the file conversion work easier.
	Second, we are configuring Net-A-Talk to save file names as their 
actual value rather than character escaping any ASCII character who's 
value is above 128 (english translation - by default Net-A-Talk saves 
"non-standard" characters such a a u with a german umlaut above it as 
a ":" followed by the hex value of that character - ie a u with an 
umlaut becomes ":9f").  Neither NFS/Share nor the Mac OS X SMB client do 
this - they save characters as that character (i.e. a u with an umlaut 
is stored as a u with an umlaut - the UNIX boxes can handle this just 
fine).  Again, this makes the file conversion work easier.  The only 
character that needs to be escaped is "/" which is a valid character for 
a file name on a Mac but not on a UNIX machine.
	
File Conversion
	The file conversion is going to be done in one large UNIX side 
sweep through the home directories (after the files are backed-up of 
course).  This way when users show up on May 13 everything "will just 
work"™ :-).
	The file conversion will be handled by a small perl script that 
Broc has written.  The actual searching of directories for candidate 
files will be handled by a tool from the server group which will then 
pass the needed file information to Broc's conversion script which will 
perform the actual conversion.

	At this point, neither of these are far enough along to actually 
test.  Of course, they had better be done by next week or you will find 
my body hanging from the nearest apple tree.  As soon as they are ready 
I'll let people know in case you want to use your account as a guinea 
pig (mine will used before yours so it should be safe :-).
	If you have any concerns, questions, gripes, or bad jokes just let 
me know.

Tom "Macintosh Doctor" Johnson
tjohnson@icd.cc.purdue.edu


Begin forwarded message:
From: Broc Seib <bseib@icd.cc.purdue.edu>
Date: May 6, 2002 9:39:31 EST
To:
Subject: Re: File conversion details 

if fixed offsets are used:

'recdefs'  => {
            '2'     =>  [ 0x0299, undef, undef ],
            '3'     =>  [ 0x0086, 0xff,  0xff  ],
            '4'     =>  [ 0x0185, 0xc8,  0x00  ],
            '8'     =>  [ 0x024d, 0x10,  0x10  ],
            '9'     =>  [ 0x025d, 0x20,  0x20  ],
            '15'    =>  [ 0x027d, 0x04,  0x04  ],
            '14'    =>  [ 0x0281, 0x04,  0x04  ],
            '13'    =>  [ 0x0285, 0x0c,  0x00  ],
            '11'    =>  [ 0x0291, 0x08,  0x08  ],
        },

first number is offset, second number is physical length of record
written to file (defaulted to zeros), and third is the length indicated
in the header.

Broc

	Here is a revised version of the document on what we need to do.  
The differences are fairly minor.  1.  I added that we need to create an 
empty data fork file if a resource fork file exists but a data fork file 
does not  2. added the file permissions that should be used on created 
files (nothing special, just there for completeness).  3.  cleaned up 
some grammar errors and unclear phrasing.

	Ok, we think we have everything hammered out on the file conversion 
front (at least until the next testing run when we find... :-).  So here 
is the document covering what everyone needs to do/know.  I am putting 
as much detail in this as possible so we have a one stop shop for how to 
do this.
	The file conversion is being handled in two parts - a file system 
traversing tool from the inst-servers group, and a actual file 
re-writing tool from inst-dev (Broc).

File traversing tool
	This tool will need to go through the directory trees on the home 
directory servers (champion, lookout, icd, others?) looking for 
candidate files.  Files for conversion will start with a % and have a 
magic number of 0x00051607.  These files will need to be handed to 
Broc's perl code which will then re-write the file to convert it to the 
desired format.  Said file then needs to be written to a sub directory 
of the current directory.  The sub directory is called .AppleDouble and 
it either
	a. exists already (unlikely but possible)
	b. will need to be created (most likely)
The filename of the new file will be the same as the input except
	a. the % that is the first character of the filename needs to be 
removed
	b. any %25 need to be changed to a % (NFS/Share character escapes 
the % character in filenames)
	c.  any %2f will need to be changed to :2f (the character escapes 
for / )
	d.  if b. or c. is done then the same must be done to a file in the 
current directory that has the same name as the input file except it 
does not begin with a %.  This file DOES NOT have a magic number.

If the %{filename} candidate file does not have a corresponding 
{filename} file one will need to be created in the same directory as the 
original %{filename} file.  This happens with some files that are 
applications and don't have a data fork (ResEdit  is an example).  This 
data file can be empty so a simple "touch" can be used to create it.

File permissions on created directories should be 700, 600 for files.

Examples
input									becomes
%george%2fdave%25						.AppleDouble/george:2fdave%
george%2fdave%25						george:2fdave%
%ResEdit								.AppleDouble/ResEdit
ResEdit									ResEdit

File re-writing (converting) tool
	This tool will re-write the Mac metadata/resource fork file using 
AppleDouble.  Rather than trying to figure out how to get Net-A-Talk to 
read a twiddled NFS/Share created file, we will re-write the file from 
scratch and build it the way Net-A-Talk expects them to look.
	Entry IDs 02, 03, 04, 08, 09, 0b, 0d, 0e, 0f will be created in 
that order in the header.  The offsets into the file that each of these 
will be found at is hard set in the Net-A-Talk code so we will follow 
suit and put our data at those same offsets (Broc and Steve have copies 
of the hexdump that shows the offsets and Steve has double checked these 
against the Net-A-Talk source) (if someone wants to type up the offsets 
real fast I can put them in this mail as well).  Since NFS/Share creates 
Entry IDs 03, 08, 09, 0a, 02 (in that order in the header) we will plug 
the data for 02, 03, 08, 09, into the file at the proper offsets as 
defined by Net-A-Talk (see above).  0a will be discarded entirely.  The 
other entry IDs will just have nulls at their proper offsets.
	03 will need to be converted from a pascal string to a C string.
	08 will need to be converted from Macintosh seconds from epoch 
(1904) to the one defined in the AppleDouble spec - seconds before/since 
12:00am Jan 1, 2000 GMT.


Tom "Macintosh Doctor" Johnson
palantir@purdue.edu


_______________________________________________________________________________

Emails about file name encoding from Net-A-Talk to Mac OS X

Date: June 5, 2003 13:13:58 EST
To:
Subject: Re: afp file conversions

	Well, Thole believe he fixed the perl script - it worked on the test files
in my account at least.  I'll leave it to him to say what he changed -
he said they were pretty minor changes.  
On the file name, we believe that the text encoding that Net-A-Talk is using
is MacRoman.  You may be able to confirm this by remembering/looking at 
whatever config file you tweaked so that it used this encoding instead of 
%{hex value} which was the default.  I know that Net-A-Talk supports 4 
different ways of writing file names.  On the OS X side, it is using UTF-8
(part of Unicode I believe).  I have given Thole a web page I found that 
has a chart of MacRoman to UTF-8 conversion.  He has been working on code
to actually do the configuration - pretty nasty he says due to the two
bytes in a unicode character, but again I'll let him describe the details.  
So, actually that is looking promising.  We would like to combine Thole's
work with your file system walking script to do some testing.  We also 
need to look into using your file system walking script to create a backup
of the files for disaster recovery.  
Two questions, one how long do you think this will take - based on last 
year's conversion (I know that actually running a dummy run is a better
way since the file system walking if the time consuming part)?  Second,
could you give Thole/Kevin a copy of the file system walking script,
mostly for reference?  

Tom "Macintosh Doctor" Johnson
palantir@purdue.edu

On Thursday, June 5, 2003, at 12:58 PM, Holmes, Steven J. wrote:

Tom, I'm going to try to get into the office tomorrow. No promises. But, can
you give me an update on the conversion work? Specifically, do we have a
better idea of the name character conversion?

Thanks,
Steve.


Begin forwarded message:
From: Michael Thole <mthole@purdue.edu>
Date: June 6, 2003 17:42:20 EST
Subject: OSX Conversion Scripts

Attached is my script for renaming MacRoman encoded files to UTF8...and not
just any UTF8, the UTF8 that the Finder understands!  The script itself is
pretty self explanatory, and you'll probably want to modify how it gets input
and what it does for the actual conversion.  Let me know if you want anything
change.

Also attached is the appledouble.pl script.  I fixed the resource-fork
corruption (the pack and sprintf SJH put in wasn't getting undone before the
file was written), and I believe fixed a couple other minor issues.  To see
what I've changed just search for "mthole", I commented everything I changed.

It seems to work nicely for our purposes (-v2 -r92 -s), but it hasn't been
tested for absolutely every possible case.  I don't believe I had any problems
with zero-length fields, but I may not have had any sufficiently weird test
files.

What do you guys think?

- Mike


From: Michael Thole <mthole@purdue.edu>
Date: June 11, 2003 11:13:07 EST
Subject: Re: roman2utf8.pl

If you try to encode a run-of-the-mill ASCII-128 file, it'd of course be 
fine... but if you try to encode a filename that uses upper-ascii characters
and wasn't meant to be in MacRoman, the script will assume it is MacRoman
and encode it that way, possibly resulting in some funky results.  Here is
a file encoded a file twice to show what I mean:

Example:
Roman:	umlaut\ \254\212\221\225\232\237\330
UTF8:	umlaut\ \302\250a\314\210e\314\210i\314\210o\314\210u\314\210y\314\210
Foobar:	umlaut\ \302\254\302\256aA\314\203a\314\200eA\314\203a\314\200iA\314\203a\314\200oA\314\203a\314\200uA\314\203a\314\200yA\314\203a\314\200

AFAIK, there isn't anyway to programatically tell what encoding the file is
in, but I asked Tom about this last week.  He thought we could just use the
same file-set that the resource conversion script uses, because anything put
up by a Mac (presumably in MacRoman) will have it's resource-fork counterpart
in ./.AppleDouble/.  Make sense to you?

- Mike

On Wednesday, June 11, 2003, at 10:53 AM, sjh@purdue.edu wrote:

Mike, you say in your comments in roman2utf8.pl that if the file name
passed to the program isn't in MacRoman format the name will get
foobar-ed. But it looks to me like the function myConvert will ignore
chars whose ord() are not in the table and just return the char. Won't
that protect the non-MacRoman file names?

Otherwise, can you give me a few lines of perl to test the file name for
suitability?

Thanks,
Steve.