root
- journal for the UNIX professional (now defunct)Raw source files: getfree.c textpieces.c
When system backups are administered well, users tend to develop a feeling of security that their data are safe and that any loss is generally limited to a day's work. This sometimes leads to a slightly careless attitude about accidentally deleting files. When, in this situation, a backup fails, the result is often catastrophe.
This was the situation I was confronted with recently while working on a
software development project.
Due to momentary confusion with a symbolic link,
one of my team managed to delete most of the source tree for the project (he
commented that a particular rm
seemed to take a suspiciously long time).
We contacted
system support and were soon informed that a successful backup had not been
made for over two months! Our UNIX disk partition was
simply a file to the underlying mainframe operating system. Backups were made
of this file as a whole, but during the prior two months the file was
inaccessible to the backup software, a fact that somehow got overlooked by
the administrative staff.
I realized that our only hope of recovery was to scan the raw deleted data on the filesystem. Considering the magnitude of the loss, it was clearly worth a considerable amount of effort to attempt recovery, so I proceeded to develop the utility software described here. There were a number of factors in our favor, allowing us to make some helpful assumptions:
SCCS
,
we had two chances of recovery for each file.
If we couldn't find the s-file, we could settle for
the g-file (the file
that get
creates).
A Berkeley Fast File System consists of a disk partition divided into a number
of cylinder groups.
Each cylinder group contains a copy of the
superblock (a structure containing the parameters of the filesystem), a
cylinder group header (a structure containing information specific to each
cylinder group), inodes
(information about files, sans hierarchical and naming information),
a free block
map, and data blocks.
The dumpfs
utility provides a nice summary of the
filesystem information and is a good first step when attempting to recover
deleted data.
The free block map is maintained as a bit map with each bit representing the availability of one fragment. A block consists of a number of fragments, and a fragment consists of a number of disk sectors (normally 512 bytes each). The allocation of whole blocks and fragments for a file is fairly complex, but suffice it to say that once a file is deleted, this relationship is lost and the deleted data can only be dealt with at the fragment level.
Of course, this description is just the tip of the iceberg. The definitive reference on the Berkeley file system is chapter 7 (while you're at it, read the whole book!) of The Design and Implementation of the 4.3BSD UNIX Operating System by Samuel J. Leffler, Marshall Kirk McKusick, Michael J. Karels, and John S. Quarterman [Addison-Wesley, 1989] or the older paper (contained in the BSD document set) A Fast File System for UNIX by Marshall Kirk McKusick, William N. Joy, Samuel J. Leffler, and Robert S. Fabry [1983].
The first utility I wrote to aid recovery is getfree
(shown on pages 27-28).
It reads the data from unallocated fragments and writes it to standard
output.
The syntax is:
getfree
<block filesystem> <cylinder group #> ...
The filesystem is a file name such as /dev/ra0g
,
and it is followed by a list of the cylinder group numbers from
which you wish to read data.
There are two reasons why the program takes a list of
cylinder group numbers rather than automatically scanning every group.
If you look at the output of dumpfs
, you will notice
that each cylinder group has a time stamp.
This indicates the time of last
update of the cylinder group header.
Since modifying the free map by deleting
files causes a header update, I was assured that we need not bother looking in
cylinder groups with time stamps older than the time of the mishap.
The other
reason is that it was necessary to work on only a few groups at a time due to
limited space on the "work" filesystem.
Let's take a closer look at getfree
.
Use of the macros included with
<ufs/fs.h>
have made the tasks of coding and understanding far easier.
Since what we're doing is very similar to the internal algorithms used in the
kernel, most of the complex calculations have already been done for us in the
macros.
Unfortunately, it's often not clear whether a structure member or macro deals with bytes, device blocks, fragments, allocation blocks, or cylinders. Even more confusing is that just about everything is called a block. We must, therefore, always carefully keep in mind which units we're working in. Often a quick look at the kernel source code can answer a lot of questions (or generate more, depending on how long you look at it).
The first step in getfree
is to read the superblock and verify its
magic number.
The superblock is located at a
fixed location in the filesystem.
This is fortunate since it must be located before
anything else is known about the filesystem.
Now in possession of the
filesystem parameters, we loop for each cylinder group requested.
The macro cgtod
calculates the location
of the cylinder group header and
free list, expressed in fragments.
Using fs_fshift
to convert to bytes,
we seek to and read the header and free list,
verifying the magic number to be
sure we've got the right stuff.
We then scan the free list for free fragments,
reading the data in each one.
Note that the location of the data is
calculated relative to the beginning of the cylinder group.
The cylinder group
header and inodes are at a variable offset
from the beginning of the cylinder
group, with data before and after,
so for simplicity the free list covers the
entire cylinder group.
The cgdmin
macro, which would appear to be what
we want to use, actually only gives us the
location of data after the inodes.
getfree
transfers data to stdout
without
any attempt at interpretation or
conversion, but with one important filter.
Any fragment consisting
entirely of the same repeated byte is not written.
Since deleted and unused
areas of files are often zero-filled, and disk controllers
generally initialize
data areas with a repeated byte when formatting (supposedly the
"hardest" bit pattern to read, and therefore most likely to cause a
format verification error in a marginal disk block),
this filtering tends to eliminate a lot of trash,
saving space on the recovery filesystem.
So now I had a big file with our source code in it
but also with lots of
non-text data.
The next utility, textpieces
(shown on pages 28-30),
scans a file created by
getfree
for fragments containing data that looks
like text and writes
each apparently contiguous piece to a separate file,
ignoring fragments that
appear to be non-text.
The syntax is:
textpieces
<output file name prefix>
The standard input is the output from getfree
.
The argument is the
prefix to be used to name the output files,
which will have names of the form
prefix.nnn
, where nnn
is a number
starting with 001
.
A report is written to standard output.
Two heuristics are used to classify the data:
The "good" fragments are written out to a series of numbered
files.
It is assumed that NULs at the end of a fragment represent
the remaining space in the
fragment past the end of a file.
As long as there are contiguous
fragments with no bit 7s set and no NULs, data are written
to the same output
file.
The string /*~~~*/
is written between each fragment for easy
identification of the boundaries.
When a fragment with trailing NULs is encountered, the fragment is written without the trailing NULs and the output file is switched to the next file. Files ending this way are generally the ones that contain a complete recovery of a deleted file.
When a fragment to be discarded is encountered, any current output file is reported as "partial," and the output is switched to the next file. It is possible, however unlikely, that this was a file that just happened to end on a fragment boundary. It is more likely that a piece of this file must be found elsewhere.
For our circumstances, the criteria for selecting text and non-text data worked very well. As a precaution against missing something, I reported what was found in each discarded fragment and looked over the report manually for questionable entries. As it turned out, all of the discarded fragments were clearly non-text data; a great majority were composed of either mostly NULs or had bit 7 set on approximately half of the bytes.
The next and final step in the process was also the most tedious and
time consuming.
Each of the hundreds of files created by textpieces
had to be
examined, identified,
split (if it contained parts from different files),
combined with other incomplete parts (if it did not
contain an entire file), and
stripped of the /*~~~*/
strings between connecting parts.
In our situation, we were attempting to recover
about 3 megabytes of data
contained in files ranging in size around 5-20K each.
Approximately 70% of
the files were each wholly contained in one recovery file.
The remaining
files had to be spliced together from pieces in
two or more recovery files.
Our intimate familiarity with the files
allowed us to match up parts fairly
easily since a small piece was all
we needed in order to identify a file.
If we had been relatively unfamiliar with the files,
this task would have been far
more difficult (in the worst case requiring n! attempted matches).
Often we ended up with two or three versions of the same file
and had to
decide which was the most current,
although with SCCS
files this wasn't a
problem.
In the end we recovered all of our source code
(even the SCCS
s-files),
which seemed to us virtually miraculous.
All we ended up losing were a couple of
Makefile
s.
For us, the attempt at recovery was well worth the effort.
At several points I had contemplated writing a more interactive utility which would allow the user to split up, tag, and combine pieces easily, possibly while reading directly from the damaged filesystem rather than requiring a copy on the recovery filesystem, but I concluded that nothing short of a major software project was a match for these rudimentary utilities, combined with a good text editor and other UNIX utilities.
textpieces
could easily be modified to
select different types of data,
such as allowing some percentage of
the fragments to have bit 7 set (common
when an extended character set is used),
or look for patterns present in a
certain type of binary-encoded and/or fixed-length-record format.
If a fixed record
length was relatively prime to the fragment size,
textpieces
could also help to splice, or
at least classify, pieces.
Although no one expects this sort of situation to happen to them, it likely will to many at some time. The experience taught us a lot about UNIX file system structure, disk layout policies, and usage patterns. Other than taking more personal responsibility for backups, there is little way to prepare for such a catastrophe. Masochists are, of course, encouraged to intentionally delete their own files for practice.
Tim Eliseo is a consultant who specializes in UNIX system software design and implementation; he is a certified MS-DOS hater. He can be reached at:
4470 Plantation Drive
Fair Oaks, CA 95628-5639
(916) 966-0509
[old email - no longer valid]
#ifndef lint static char sccsID[] = "@(#)getfree.c 1.3 90/10/25 22:05:42"; #endif /* Read data from BSD filesystem free blocks Copyright (c) 1988, 1990 Timothy R. Eliseo. All rights reserved. Permission is granted to use but not to sell this software provided that the above message is not removed and that any modification is noted. */ #include <sys/types.h> #include <sys/param.h> #include <stdio.h> #include <ufs/fs.h> /* <sys/fs.h> on some systems */ #include <sys/file.h> #include <malloc.h> #ifndef offsetof /* ANSI C would define this in <stddef.h> */ #define offsetof(s_name, m_name) (size_t)&(((s_name *)0)->m_name) #endif main(argc, argv) char *argv[]; { int fsfd; /* file descriptor of the filesystem */ struct fs fs; /* filesystem header (superblock) */ struct cg cg; /* cylinder group header */ u_char *freelist, /* pointer to cg free list */ *dp; static u_char data[MAXBSIZE]; /* a block of data */ if (argc < 2) { fprintf(stderr, "Usage: %s <block fs> <cg #> ...\n", argv[0]); exit(1); } /* open the filesystem */ if ((fsfd = open(argv[1], O_RDONLY)) < 0) { perror(argv[1]); exit(2); } /* read the superblock */ lseek(fsfd, SBLOCK * DEV_BSIZE, L_SET); read(fsfd, &fs, sizeof fs); if (fs.fs_magic != FS_MAGIC) { fprintf(stderr, "Invalid magic number %lx in superblock\n", fs.fs_magic); exit(3); } /* allocate a buffer for the maximum size of a cylinder group free list */ freelist = (u_char *)malloc(fs.fs_bsize - offsetof(struct cg, cg_free[0])); argc--; /* past the program name */ argv++; /* loop for each cylinder group # on the command line */ while (--argc) { int cgn; long cgs, frn; cgn = atoi(*++argv); /* cylinder group number */ cgs = cgbase(&fs, cgn); /* the base of the cylinder group in frags */ /* read the cg header */ lseek(fsfd, (long)cgtod(&fs, cgn) << fs.fs_fshift, L_SET); read(fsfd, &cg, offsetof(struct cg, cg_free[0])); if (cg.cg_magic != CG_MAGIC) { fprintf(stderr, "Invalid magic number %lx in cylinder group %d\n", cg.cg_magic, cgn); exit(3); } /* read the free list, located right after the fixed size header info */ read(fsfd, freelist, (cg.cg_ndblk + 7) / 8); /* loop for each free block */ for (frn = 0; frn < cg.cg_ndblk; frn++) { if (freelist[frn / NBBY] >> frn % NBBY & 1) { /* block free? */ /* read the block's data */ lseek(fsfd, (long)(cgs + frn) << fs.fs_fshift, L_SET); read(fsfd, data, fs.fs_fsize); /* write the data only if it's not filled with a repeated byte */ for (dp = data + 1; dp < data + fs.fs_fsize; dp++) if (data[0] != *dp) break; if (dp != data + fs.fs_fsize) write(1, data, fs.fs_fsize); } /* if free */ } /* for each block */ } /* for each cg */ return 0; }
#ifndef lint static char sccsID[] = "@(#)textpieces.c 1.4 90/10/25 22:08:17"; #endif /* Break getfree output into continuous-text pieces and discard non-text data Copyright (c) 1988, 1990 Timothy R. Eliseo. All rights reserved. Permission is granted to use but not to sell this software provided that the above message is not removed and that any modification is noted. */ #include <stdio.h> #include <strings.h> /* The size of a fragment is defined here to avoid cluttering this program with filesystem dependencies. It is shown by dumpfs as "fsize" */ #define FG_SIZE 1024 int isopen = 0; FILE *outf; int fragno = 0, firstfrag; char outfn[200], /* used to construct an output filename */ *endpos, /* points past the prefix */ newfn[200]; /* used to rename the file to a "partial" */ main(argc, argv) char *argv[]; { unsigned char buf[FG_SIZE], *bp; int nnuls, curnuls, ngrps, nbit7s, curfn = 0; if (argc != 2) { fprintf(stderr, "Usage: %s <prefix>\n", argv[0]); exit(1); } strcpy(outfn, argv[1]); strcpy(newfn, argv[1]); endpos = outfn + strlen(outfn); while (fread(buf, FG_SIZE, 1, stdin)) { nnuls = ngrps = nbit7s = curnuls = 0; /* check each byte in the fragment */ for (bp = buf; bp < buf + FG_SIZE; bp++) { if (*bp & 0x80) nbit7s++; if (!*bp) { /* scan for the first non-NUL */ while (curnuls++, ++bp < buf + FG_SIZE) { if (*bp) { nnuls += curnuls; curnuls = 0; ngrps++; break; } } bp--; /* since the outer loop will increment bp */ } } if (nnuls || nbit7s) { /* any embedded NULs or high bits? */ close_partial(); printf( "Fragment %d rejected: %d NULs in %d groups (%d at end), %d high bits\n", fragno, nnuls, ngrps, curnuls, nbit7s); } else if (isopen || curnuls != FG_SIZE) { /* if the data is good and it's not just another NUL-filled fragment at the end of a block */ if (isopen) { fputs("/*~~~*/", outf); /* write a separator string */ } else { /* start a new file */ firstfrag = fragno; sprintf(endpos, ".%03d", ++curfn); if (!(outf = fopen(outfn, "w"))) { perror(outfn); exit(2); } isopen = 1; } /* write all but any trailing NULs */ if (fwrite(buf, FG_SIZE - curnuls, 1, outf) != 1) { perror(outfn); exit(2); } if (curnuls) { /* if there were trailing NULs, close the current file and report it */ fclose(outf); isopen = 0; printf("File %s: fragments %d-%d (%d NULs)\n", outfn, firstfrag, fragno, curnuls); } } /* if good data */ fragno++; } /* while read not eof */ close_partial(); return 0; } /* close a file of "partial" data, rename it to end in "p", and report */ close_partial() { if (isopen) { fclose(outf); isopen = 0; strcpy(newfn + (endpos - outfn), endpos); strcat(newfn, "p"); rename(outfn, newfn); printf("File %s: fragments %d-%d (partial)\n", newfn, firstfrag, fragno - 1); } }