Leaving Apple’s Nursing Home
This series is about replacing a MacBook Air with an equally beautiful Freedom Software laptop. It is also about setting up a Freedom Software laptop for the kind of user who wants it to Just Work with the least possible involvement and no interest in how it works.
Part 1 of this series was about the rationale and the hardware.
Part 2 of this series was about choosing and configuring the software.
Part 3: Data Recovery
First I would recover all the generic user files. Most of the documents are in portable formats such a text or PDF or Open Document Format (used with NeoOffice on the Mac, LibreOffice or OpenOffice on the new computer)
After that, I wanted to recover some important data from specific Mac apps:
- Photos from iPhoto / Apple Photos
- email from Apple Mail
- references from Zotero
- bookmarks from Safari
(Ordered from most to least important.)
Restore Data from TimeMachine or from SSD?
- from a TimeMachine backup: put together a software solution, copy data off it
- from the Mac’s SSD: extract the SSD from the Mac, buy a special adapter, copy data off it
From a TimeMachine backup
There was a recent TimeMachine backup, stored on our NAS. As I don’t have a working Mac to run TimeMachine on, I searched for Linux software able to read it. My searches led to using sparsebundlefs plus tmfs to mount the backup as a directory tree. It took me quite some attempts at fighting the permissions system, especially the way that FUSE filesystems deal with permissions, until I could see a list of top level folders, one per snapshot, with one named “Latest” pointing at the latest snapshot. Inside there was apparently a complete snapshot of the Mac’s disk filesystem.
The TimeMachine backup was not encrypted. While the connection from the TimeMachine app to its storage folder on the NAS had required a password, the data inside was not encrypted. (By comparison, some backup systems such as Borg encrypt data on the client side before sending it to the server, so that only the data’s owner can decrypt and read it.)
The sparsebundlefs + tmfs software appeared to have given access to all the files. When I started copying them, however, two issues arose. First, the extraction speed was initially terrible, 0.2 MB/s, when the extractor was running on one machine, with sparsebundlefs remotely accessing the NAS TimeMachine storage through SSHFS, and rsync’ing its output over to the new machine. I suspected the main problem was random access to the NAS’s moderately slow spinning disk, and the secondary problem may be the SSHFS access to it.
Rather than measure and diagnose the exact cause, I copied the TM backup folder over to a faster disk (still a spinning disk) on the extractor machine. This copying went much faster, presumably because it was mostly sequential reading. Using that copy directly on the extractor machine, so cutting out SSHFS too, the extraction process was then much faster.
Then the second problem struck. The extracted data was much larger than expected, too large for the disks on the extractor machine and the target machine. It turned out the extractor was not preserving symlinks. It was presenting every symlinked directory as a separate copy of the directory. I did not know what directories (and perhaps some files too) had originally been symlinked on the Mac, and I could no longer boot it to find out.
I could guess some of the symlinks, partly from prior knowledge, partly from using ‘du’ to spot directory trees of identical huge sizes, and partly from only sources where I found some lists of symlinks others have catalogued, especially those created by migrations through successive versions of iPhoto to Photos. I confirmed the guesses by using an ‘rsync –dry-run’ to verify whether the content of one directory was identical to the other for each of my guesses. (‘diff -r’ works too but is slower because it always reads the full file content whereas rsync takes a shortcut if the file size and timestamp match.)
I ended up manually adding ‘exclude’ rules to my ‘rsync’ invocation. I excluded (in the home dir):
- Applications/
- Library/
- except for “Library/Mail” and “Library/Mail Downloads”
- Pictures/iPhoto Library*.migratedphotolibrary/
- an old pre-migration folder that should have contained symlinks
- Pictures/Photos Library*.photoslibrary/Originals/
- which should have been a symlink to ‘Masters’
I also excluded a few other files and folders that held nothing interesting and would clutter or confuse the target. Here is the exclude list I used (not mentioning the ‘Library/Mail’ and ‘Library/Mail Downloads’ exceptions).
.android
.bash_sessions
.CFUserTextEncoding
.cups
.DS_Store
.lesshst
.mozilla
.ssh
Applications
Library
Pictures/iPhoto Library Test.migratedphotolibrary
Pictures/Photos Library Test.photoslibrary/Originals
Pictures/Photos Library Test.photoslibrary/resources
Public
Sites
(Your photo library would not have the word ‘Test’ in its name, by default. Mine did, caused by some manual repair by an Apple shop technician years ago.)
For additional speed in transferring a large amount of data to the new laptop, I copied a couple of chunks of it over on a USB memory stick, as rsync over the WiFi connection was going at only 5 MB/s (~50 Mpbs) even when near the WiFi access point. It would have been a good idea to buy a USB-to-Ethernet adaptor for a task like this, which could have gone much faster.
More details on TimeMachine storage format and manually accessing it: Deep Dive or here, by Glenn ‘devalias’ Grant.
From the Mac’s SSD
Reading data directly from the Mac’s SSD would have saved me time in fiddling with the sparsebundlefs + tmfs software, and in dealing with the data that should have been symlinks but wasn’t.
Apple used a non-standard SSD connector on some MacBook Air (and Pro) models. We can buy an adapter for the particular Mac model, to connect the SSD to a standard SATA connector, or to a USB-to-SATA adapter.
I ordered an SSD adapter. When it arrived, I got out my collection of security screwdriver bits (various sizes and odd shapes) and found I didn’t have the required tiny 5-pointed star shape. Dang.
I will order the special screwdriver because, even though I completed the data transfer, I do not want to sell or dispose of the broken Mac with the private data still on it. (It’s not encrypted. Next time it should be. And indeed I have set up the new computer with disk encryption.)
I have also heard that one can get low level access to an internal drive through the Thunderbird port. I have not investigated whether this is possible in my case.
Recover Photos from iPhoto / Apple Photos
The plain JPEG (etc.) files are found in the ‘Pictures/Photos Library.photoslibrary/Masters’ folder.
TODO: Find out if there were also metadata stored separately, e.g. photo album names and comments.
Recover Email from Apple Mail
For mail accounts using IMAP: the mail should be on the mail server. Don’t bother trying to recover anything from the local data.
For mail accounts using POP: the mail is stored only locally and we will want to recover it.
We find an “Apple Mail to dovecot mailbox converter” at https://github.com/pguyot/emlx_to_mbox.
I installed Erlang (as required) and ran it… and it did not work. Here is the output from a test run on a single message:
$ escript emlx_to_mbox.escript --single ~/tm-home/Library/Mail/V4/863E1A15-*/INBOX.mbox/233CA490-*/Data/0/0/1/Messages/100638.emlx
emlx_to_mbox.escript:13: Warning: erlang:get_stacktrace/0 is deprecated and will be removed in OTP 24; use use the new try/catch syntax for retrieving the stack backtrace
escript: exception error: no case clause matching
{ok,{http_header,0,<<"Return-Path">>,<<"Return-Path">>,
<<"<LISTNAME-bounces+EMAIL=DOMAIN@mailman.DOMAIN>">>},
<<"Received: from [10.92.1.161] (HELO SERVER)\n by SERVER (CommuniGate Pro SMTP 6.0.11)\n with ESMTP id 399065899 for EMAIL@DOMAIN; Tue, "...>>}
in function emlx_to_mbox_escript__escript__1634__919879__991744__2:get_header_value/2 (emlx_to_mbox.escript, line 286)
in call from emlx_to_mbox_escript__escript__1634__919879__991744__2:process_emlx_file/4 (emlx_to_mbox.escript, line 71)
in call from escript:run/2 (escript.erl, line 758)
in call from escript:start/1 (escript.erl, line 277)
in call from init:start_em/1
in call from init:do_boot/3
I have not programmed in Erlang before. Maybe now would be a good time to start?
Recover References from Zotero
Copying the Zotero folder to the Linux laptop Just Worked. Hooray!
Recover Bookmarks from Safari
TODO.