Emails from 2002 - intial writing of script. Remember that this is from NFS/Share to Net-A-Talk so basically none of what is here needs to be done in your case. However, it does provide background information about the file formats and how our appledouble perl script works. NOTE: we were/are all TOP posters, so within an email, the top part will be the reply, the lower part will be the original. However, since there are several emails and I listed them older on top, newer email on bottom, you'll need to watch the dates on things. From: Broc Seib Date: April 30, 2002 10:30:05 EST To: Subject: Re: Net-A-Talk file format and file conversion status  Tom, I'm ready to tweek the file-convert script. Do you know the final format we need? Broc As people may recall from my last update the file format that  Net-A-Talk will use on the servers and our file conversion was still a  work in progress.  Well, they are still a work in progress but certain  decisions and progress has been made so here is the current status on  our one major loose end Mac side. Short version Will use Net-A-Talk's format of the AppleDouble version 2 spec.   (not Mac OS X SMB client's format) Will continue to work on getting Net-A-Talk to read/write Mac OS X  SMB client format over summer File conversion will be (obviously) to Net-A-Talk's AppleDouble  version 2 File conversion will be performed in one large sweep from the  server side during the break between Spring and Maymester Long version Net-A-Talk Though Steve Holmes has made a lot of progress in the last 2  weeks,  we still have one last known major hurdle to overcome to get  Net-A-Talk reading/writing files in the same format that the Mac OS X  SMB client .  However, if you check your calendar we are out of time.   Since we really need Steve doing the Sys admin work on Net-A-Talk  (configuring, installing, testing, etc) we have put the change of  Net-A-Talk on hold.  We fully intend to continue pursuing this over the  summer and see if we can get it working before Fall as we would still  like to use the Mac OS X SMB client format for all the reasons listed in  earlier notes.  If you are following the mechanics you will realize that  if  if this work on Net-A-Talk over the summer were to succeed, it would  require a second file conversion to actually implement.  Most likely  candidate would be at the break between summer and fall semesters. So, for the time being we are installing Net-A-Talk so that it will  write files in Net-A-Talk's format of AppleDouble.  We are making two  changes from the defualt configuration of Net-A-Talk.  We are having  Net-A-Talk save as AppleDouble version 2 instead of version 1.  Both  NFS/Share and the Mac OS X SMB client use AppleDouble version 2 so this  will make the file conversion work easier. Second, we are configuring Net-A-Talk to save file names as their  actual value rather than character escaping any ASCII character who's  value is above 128 (english translation - by default Net-A-Talk saves  "non-standard" characters such a a u with a german umlaut above it as  a ":" followed by the hex value of that character - ie a u with an  umlaut becomes ":9f").  Neither NFS/Share nor the Mac OS X SMB client do  this - they save characters as that character (i.e. a u with an umlaut  is stored as a u with an umlaut - the UNIX boxes can handle this just  fine).  Again, this makes the file conversion work easier.  The only  character that needs to be escaped is "/" which is a valid character for  a file name on a Mac but not on a UNIX machine. File Conversion The file conversion is going to be done in one large UNIX side  sweep through the home directories (after the files are backed-up of  course).  This way when users show up on May 13 everything "will just  work"™ :-). The file conversion will be handled by a small perl script that  Broc has written.  The actual searching of directories for candidate  files will be handled by a tool from the server group which will then  pass the needed file information to Broc's conversion script which will  perform the actual conversion. At this point, neither of these are far enough along to actually  test.  Of course, they had better be done by next week or you will find  my body hanging from the nearest apple tree.  As soon as they are ready  I'll let people know in case you want to use your account as a guinea  pig (mine will used before yours so it should be safe :-). If you have any concerns, questions, gripes, or bad jokes just let  me know. Tom "Macintosh Doctor" Johnson tjohnson@icd.cc.purdue.edu Begin forwarded message: From: Broc Seib Date: May 6, 2002 9:39:31 EST To: Subject: Re: File conversion details  if fixed offsets are used: 'recdefs'  => {             '2'     =>  [ 0x0299, undef, undef ],             '3'     =>  [ 0x0086, 0xff,  0xff  ],             '4'     =>  [ 0x0185, 0xc8,  0x00  ],             '8'     =>  [ 0x024d, 0x10,  0x10  ],             '9'     =>  [ 0x025d, 0x20,  0x20  ],             '15'    =>  [ 0x027d, 0x04,  0x04  ],             '14'    =>  [ 0x0281, 0x04,  0x04  ],             '13'    =>  [ 0x0285, 0x0c,  0x00  ],             '11'    =>  [ 0x0291, 0x08,  0x08  ],         }, first number is offset, second number is physical length of record written to file (defaulted to zeros), and third is the length indicated in the header. Broc Here is a revised version of the document on what we need to do.   The differences are fairly minor.  1.  I added that we need to create an  empty data fork file if a resource fork file exists but a data fork file  does not  2. added the file permissions that should be used on created  files (nothing special, just there for completeness).  3.  cleaned up  some grammar errors and unclear phrasing. Ok, we think we have everything hammered out on the file conversion  front (at least until the next testing run when we find... :-).  So here  is the document covering what everyone needs to do/know.  I am putting  as much detail in this as possible so we have a one stop shop for how to  do this. The file conversion is being handled in two parts - a file system  traversing tool from the inst-servers group, and a actual file  re-writing tool from inst-dev (Broc). File traversing tool This tool will need to go through the directory trees on the home  directory servers (champion, lookout, icd, others?) looking for  candidate files.  Files for conversion will start with a % and have a  magic number of 0x00051607.  These files will need to be handed to  Broc's perl code which will then re-write the file to convert it to the  desired format.  Said file then needs to be written to a sub directory  of the current directory.  The sub directory is called .AppleDouble and  it either a. exists already (unlikely but possible) b. will need to be created (most likely) The filename of the new file will be the same as the input except a. the % that is the first character of the filename needs to be  removed b. any %25 need to be changed to a % (NFS/Share character escapes  the % character in filenames) c.  any %2f will need to be changed to :2f (the character escapes  for / ) d.  if b. or c. is done then the same must be done to a file in the  current directory that has the same name as the input file except it  does not begin with a %.  This file DOES NOT have a magic number. If the %{filename} candidate file does not have a corresponding  {filename} file one will need to be created in the same directory as the  original %{filename} file.  This happens with some files that are  applications and don't have a data fork (ResEdit  is an example).  This  data file can be empty so a simple "touch" can be used to create it. File permissions on created directories should be 700, 600 for files. Examples input becomes %george%2fdave%25 .AppleDouble/george:2fdave% george%2fdave%25 george:2fdave% %ResEdit .AppleDouble/ResEdit ResEdit ResEdit File re-writing (converting) tool This tool will re-write the Mac metadata/resource fork file using  AppleDouble.  Rather than trying to figure out how to get Net-A-Talk to  read a twiddled NFS/Share created file, we will re-write the file from  scratch and build it the way Net-A-Talk expects them to look. Entry IDs 02, 03, 04, 08, 09, 0b, 0d, 0e, 0f will be created in  that order in the header.  The offsets into the file that each of these  will be found at is hard set in the Net-A-Talk code so we will follow  suit and put our data at those same offsets (Broc and Steve have copies  of the hexdump that shows the offsets and Steve has double checked these  against the Net-A-Talk source) (if someone wants to type up the offsets  real fast I can put them in this mail as well).  Since NFS/Share creates  Entry IDs 03, 08, 09, 0a, 02 (in that order in the header) we will plug  the data for 02, 03, 08, 09, into the file at the proper offsets as  defined by Net-A-Talk (see above).  0a will be discarded entirely.  The  other entry IDs will just have nulls at their proper offsets. 03 will need to be converted from a pascal string to a C string. 08 will need to be converted from Macintosh seconds from epoch  (1904) to the one defined in the AppleDouble spec - seconds before/since  12:00am Jan 1, 2000 GMT. Tom "Macintosh Doctor" Johnson palantir@purdue.edu _______________________________________________________________________________ Emails about file name encoding from Net-A-Talk to Mac OS X Date: June 5, 2003 13:13:58 EST To: Subject: Re: afp file conversions Well, Thole believe he fixed the perl script - it worked on the test files in my account at least.  I'll leave it to him to say what he changed - he said they were pretty minor changes.   On the file name, we believe that the text encoding that Net-A-Talk is using is MacRoman.  You may be able to confirm this by remembering/looking at whatever config file you tweaked so that it used this encoding instead of %{hex value} which was the default.  I know that Net-A-Talk supports 4 different ways of writing file names.  On the OS X side, it is using UTF-8 (part of Unicode I believe).  I have given Thole a web page I found that has a chart of MacRoman to UTF-8 conversion.  He has been working on code to actually do the configuration - pretty nasty he says due to the two bytes in a unicode character, but again I'll let him describe the details.   So, actually that is looking promising.  We would like to combine Thole's work with your file system walking script to do some testing.  We also need to look into using your file system walking script to create a backup of the files for disaster recovery.   Two questions, one how long do you think this will take - based on last year's conversion (I know that actually running a dummy run is a better way since the file system walking if the time consuming part)?  Second, could you give Thole/Kevin a copy of the file system walking script, mostly for reference?   Tom "Macintosh Doctor" Johnson palantir@purdue.edu On Thursday, June 5, 2003, at 12:58 PM, Holmes, Steven J. wrote: Tom, I'm going to try to get into the office tomorrow. No promises. But, can you give me an update on the conversion work? Specifically, do we have a better idea of the name character conversion? Thanks, Steve. Begin forwarded message: From: Michael Thole Date: June 6, 2003 17:42:20 EST Subject: OSX Conversion Scripts Attached is my script for renaming MacRoman encoded files to UTF8...and not just any UTF8, the UTF8 that the Finder understands!  The script itself is pretty self explanatory, and you'll probably want to modify how it gets input and what it does for the actual conversion.  Let me know if you want anything change. Also attached is the appledouble.pl script.  I fixed the resource-fork corruption (the pack and sprintf SJH put in wasn't getting undone before the file was written), and I believe fixed a couple other minor issues.  To see what I've changed just search for "mthole", I commented everything I changed. It seems to work nicely for our purposes (-v2 -r92 -s), but it hasn't been tested for absolutely every possible case.  I don't believe I had any problems with zero-length fields, but I may not have had any sufficiently weird test files. What do you guys think? - Mike From: Michael Thole Date: June 11, 2003 11:13:07 EST Subject: Re: roman2utf8.pl If you try to encode a run-of-the-mill ASCII-128 file, it'd of course be fine... but if you try to encode a filename that uses upper-ascii characters and wasn't meant to be in MacRoman, the script will assume it is MacRoman and encode it that way, possibly resulting in some funky results.  Here is a file encoded a file twice to show what I mean: Example: Roman: umlaut\ \254\212\221\225\232\237\330 UTF8: umlaut\ \302\250a\314\210e\314\210i\314\210o\314\210u\314\210y\314\210 Foobar: umlaut\ \302\254\302\256aA\314\203a\314\200eA\314\203a\314\200iA\314\203a\314\200oA\314\203a\314\200uA\314\203a\314\200yA\314\203a\314\200 AFAIK, there isn't anyway to programatically tell what encoding the file is in, but I asked Tom about this last week.  He thought we could just use the same file-set that the resource conversion script uses, because anything put up by a Mac (presumably in MacRoman) will have it's resource-fork counterpart in ./.AppleDouble/.  Make sense to you? - Mike On Wednesday, June 11, 2003, at 10:53 AM, sjh@purdue.edu wrote: Mike, you say in your comments in roman2utf8.pl that if the file name passed to the program isn't in MacRoman format the name will get foobar-ed. But it looks to me like the function myConvert will ignore chars whose ord() are not in the table and just return the char. Won't that protect the non-MacRoman file names? Otherwise, can you give me a few lines of perl to test the file name for suitability? Thanks, Steve.