After the filesystem crash back in October I had 11 gigabytes worth of files without meaningful names just sitting in a lost+found directory. I was forced to come up with some creative ways to salvage data.
$ find lost+found/ -type f | wc -l
79370
$ du -sh lost+found/
11G lost+found/
One of the first things I thought people might want access to would be their digital photographs, so I decided to cook up a little program to sort through the entire lost+found directory, find JPEG’s with a “Camera Model” EXIF tag and then copy them into an appropriate directory based on the value of that tag.
My first attempt used the shell, but that proved unacceptably slow so I fell back to my good friend Python. I installed the Python Image Library (python-imaging in Ubuntu), which I used to identify the JPEG’s (very fast, by the way), and which also has (kludgy, experimental) support for reading EXIF tags.
Here’s the code:
#! /usr/bin/python
import sys
import os
import Image
from ExifTags import TAGS
from shutil import copyfile
INDIR = 'lost+found'
OUTDIR = 'recovered-photos'
def recover_photo(infile, outdir):
try:
im = Image.open(infile)
if im.format == 'JPEG':
tags = {}
exif = im._getexif()
for tag, val in exif.items():
lookup = TAGS.get(tag, tag)
tags[lookup] = val
dirname = tags['Model'].strip()
dirname = dirname.replace(' ', '_')
outdirname = os.path.join(outdir, dirname)
if not os.path.isdir(outdirname):
os.makedirs(outdirname)
outfile = os.path.basename(infile)
if not outfile.endswith(('.jpg', '.JPG')):
outfile = outfile + '.jpg'
outfile = os.path.join(outdirname, outfile)
copyfile(infile, outfile)
print infile, '->', outfile
except IOError, v:
try:
(errno, errmsg) = v
except:
errmsg = v
print infile, errmsg
except:
print infile, 'unknown error'
for root, dirs, files in os.walk(INDIR):
for name in files:
infile = os.path.join(root, name)
recover_photo(infile, OUTDIR)
After running for 28m18s:
$ ls -l recovered-photos/
total 492
drwxr-xr-x 2 matthew matthew 4096 2008-01-26 13:58 C2100UZ
drwxr-xr-x 2 matthew matthew 4096 2008-01-26 14:18 C3100Z,C3020Z
drwxr-xr-x 2 matthew matthew 12288 2008-01-26 14:18 C860L,D360L
drwxr-xr-x 2 matthew matthew 36864 2008-01-26 14:17 C960Z,D460Z
...
$ find recovered-photos/ -type f | wc -l
9293
$ du -sh recovered-photos/
7.3G recovered-photos/
9293 digital photographs all nicely sorted.
I’ll clean the code up, flesh it out with some handy command-line args and post a link to the updated version here as soon as I have a few spare cycles.