

Because of this, programs that open a file can determine if the file is teh correct type and other programs can verify the type of a file. Most file types have a signature (usually at the start of the file, in the header) which contains a sequence of bytes that identify the file as a certain type. Instead of looking at a deleted file's information (filename, size, timestamp, starting cluster) in directory entry and then copying the clusters from the disk, they search the whole disk for lost files. That is why other programs like PhotoRec use a different approach. However (not surprisingly perhaps), it can miss out on some files. It is generally more reliable because it looks at real files that existed more recently. The above explanation is how some data-recovery programs work. PhotoRec (and its ilk)Īs you noticed, PhotoRec seems to recover more (at the cost of lost filenames).

In fact, even with plain-text files, it is difficult at best if the file had been getting edited and saved after changes numerous times because it then becomes difficult to identify the clusters that contain blocks of the last version of the file. But with binary files, this is effectively impossible. If the files are plain-text, then you could search the drive for unused clusters (which is a nightmare with a giant, nearly empty disk) and manually stitch the file back together (I did this a few times many years ago). This works fine if the files are stored in a single, contiguous block (i.e., defragmented), but if they were fragmented, then their blocks are spread out around the disk and the program has absolutely no way to know where/which ones to use that's why most of the corrupted recovered files will have at least one cluster's worth of correct data, but then contain whatever happened to be in the subsequent clusters that used to belong to other files. What happened was that those files were fragmented, and once they were deleted, the cluster chain was removed, so when the programs "recovered" them, what they did was to look at the starting location (which is still present) and the size of the file (which is also still present) and simply copied that many clusters in a row from the start.
