Monday 30 April 2012

Moving Mail to a New Laptop

My wife was getting fed up with her eMac running slower and slower, plus incompatibility between her MS Office X and the MS Office 2010 in use at her company, so for her birthday she got a Windows 7 laptop. Until the weekend just gone, however, she didn't make much use of it because it was beyond her ability to move her mail account from Entourage to Outlook and to move all her files across. The address book was a particular problem.

Eventually we found ways to achieve almost all these migrations. The document files were the easiest part - local area network connection, turn on Windows sharing on the Mac, just drag the files over (we had to retry a few times because the connection reset while copying long files).

To migrate the mail messages, we decided to use Thunderbird instead of Outlook (this PC doesn't connect to a corporate Exchange server). Drag each folder from Entourage to a Finder folder or the desktop. This exports the messages in the form of a .mbox file. Rename to remove the extension and copy them to Thunderbird's Mail folder in the current profile.

The mail addresses turned out to be rather a tougher proposition. You can export each address card to a .vcf file, but Thunderbird found no useful information in them. It dawned on me later that this might be a line-ending issue - we could have run mac2unix and then unix2dos before importing. But by then we had solved the problem in a different way.

Entourage allows you to drag address cards to the Mac's native address book application. This was synchronized with Plaxo using my wife's account. Plaxo supports export of the address book in LDIF format, which Thunderbird was able to import without problems. If you can't use this route, you can also export the entire Mac address book in a single operation to .vcf (or possibly even to LDIF). I suggest adjusting the line endings to Windows conventions before attempting to import to Thunderbird.

Then it turned out that the vast majority of e-mail addresses that she needed were not in either address book. Entourage just maintains a cache of recently used addresses and we could find no way to export all of these into any sort of useful file format. To the rescue came Adam Haeder with a simple shell script to extract plausible e-mail addresses from mbox files. I was able to adapt it for use under Cygwin and enhance it a bit to cope with non-US addresses and escaped newlines and spaces. The resulting tab-delimited file was suitable for import to Thunderbird.

 #!/bin/bash  
   
 # This script will parse an mbox file, displaying all of the From: email addresses, removing ones that  
 # are from postmaster, mail admins, etc  
   
 FILE=$1  
   
 if [ ! -r $FILE ]; then  
  if [ -r /var/spool/mail/$FILE ]; then  
  FILE="/var/spool/mail/$FILE"  
  else  
  echo "Sorry! Neither $FILE nor /var/spool/mail/$FILE exists, or I can't read them"  
  exit  
  fi  
 fi  
   
 # Using cat to read the input means you can run this over many mbox files simultaneously  
 # dos2unix and mac2unix standardise line endings to LF only  
 # First sed script joins lines that have been split with a "=" at line end  
 # grep isolates lines starting with "From:" and egrep -vi rejects all non-human addresses  
 # Next grep discards any lines that contain no email address (@)  
 # The next sed script turns =20 into spaces and discards trailing spaces and "From:"  
 # The next converts [mailto:x@y.z] to <x@y.z> form  
 # Any bare SMTP addresses are converted to "x@y.z <x@y.z>"  
 # Any SMTP addresses with no friendly name but in "<>" delimiters are converted similarly  
 # Finally the "<>" delimiters are removed and replaced with a tab separator  
 # Result is sorted uniquely (ignore case & leading blanks) and converted to DOS line endings  
 cat $FILE | dos2unix.exe | mac2unix.exe |\  
 sed '/=$/{N;s/=\n//}' |\  
 grep "^From:" | egrep -vi \  
 "(postoffice|\  
 postman|\  
 administrator|\  
 bounce|\  
 MAILER-DAEMON|\  
 postmaster|\  
 Mail Administrator|\  
 Auto-reply|\  
 out of office|\  
 Mail Delivery System|\  
 Email Engine|\  
 Mail Delivery Subsystem|\  
 Mail.Administrator|\  
 non.deliverable)" |\  
 grep '@' |\  
 sed 's/=20/ /g;s/\s*$//;s/^From:\s*//' |\  
 sed 's/\[mailto:\(.*\)\]/<\1>/g;s/"//g' |\  
 sed '/^[^<]*$/s/^\(.*\)$/\1 <\1>/' |\  
 sed 's/^<\(.*\)>$/\1 <\1>/' |\  
 sed 's/^\(.*\S\)\s*<\(.*\)>/\1\t\2/' |\  
 sort -ubf | unix2dos.exe