Forum: Recovering bam files after unknow deletion in the storage
0
gravatar for Gabriel Wajnberg
3 days ago by
Gabriel Wajnberg60 wrote:

Hello all,

We had a problem in our storage and for an unknown reason, we lost data.... I used photorec to recover the lost data, however, photorec recovers (as I understood not the total files), so for example I could recover some bam files and convert them to sam, however they are just a fragment , a couple of mb of data each file.

I realized that there are huge gb binary files, (with a real expected bam files size) and if I just read it , I can see sometimes reads ids in the middle of the binary code... however when I try to read it using samtools view, it can't read , it was complaining about EOF problem (however, I found a good script to insert an EOF end and now samtools doesn't complain about this ) and also about "fail to read the header" ...is there a way to insert a header in this possible bam file? or another option to recover these files??

Ps: we don't have backup =(

ADD COMMENTlink modified 2 days ago by jkbonfield70 • written 3 days ago by Gabriel Wajnberg60
2

Even if you managed to recover readable files, would you trust their content?

ADD REPLYlink written 3 days ago by Jean-Karim Heriche19k

the current readable files I don't trust at all...

ADD REPLYlink written 3 days ago by Gabriel Wajnberg60
2

Go to a professional data recovery specialist.

ADD REPLYlink written 3 days ago by Kevin Blighe43k
1

I would add my voice to Kevin Blighe; don't lose any more time and just go to a professional data recovery.

ADD REPLYlink written 3 days ago by H.Hasani730
1

I still wouldn't trust the recovered data. Most likely either part of the data was overwritten or the storage (disk?) failed and bits have been corrupted. The only situation in which I would trust a recovered file is when only the link to the inode (on Linux) has been removed (i.e. the file was accidentally deleted and nothing had time to overwrite it yet).

ADD REPLYlink written 3 days ago by Jean-Karim Heriche19k

You mean that the mere act of trying to recover the data may have damaged it further? This happened to me a few years ago. I realised the serious nature of the failure and did not try to do anything myself.

ADD REPLYlink written 3 days ago by Kevin Blighe43k
1

however, I found a good script to insert an EOF end and now samtools doesn't complain about this

That's not a good fix, samtools is trying to warn you about a legitimate error and you're just bypassing its error checking mechanism, not addressing the underlying corruption. Don't use that script

ADD REPLYlink written 3 days ago by RamRS21k

I just want to check the content of these binary files! or I just categorized as lost ??

ADD REPLYlink written 3 days ago by Gabriel Wajnberg60

Unless you understand the cause/source of the corruption, without backup there's not much you can do.

ADD REPLYlink written 3 days ago by Jean-Karim Heriche19k
1

At some point, it could be cheaper just to sequence again the samples (if available)

ADD REPLYlink written 3 days ago by JC7.9k

Thank you for all the tips!

ADD REPLYlink written 3 days ago by Gabriel Wajnberg60
3
gravatar for jkbonfield
2 days ago by
jkbonfield70
jkbonfield70 wrote:

Firstly, don't mount those drives read-write any more. Read-only from now on or you'll exacerbate any data recovery chance by either yourself or professionals.

BAM has specific signatures that data recovery tools are unlikely to spot, but you could, perhaps extract them yourself from raw disk images (assuming it's not some complex raid stripe). If you get someone in to recover your data, make sure you explain to them the nature of the BAM format (a series of small concatenated gzip files) as it may help. Spotting a whole series of gzip signatures in the raw disk images is the BAM equivalent of what your photo-recovery tool is attempting to do with images. So it's possible, but very complex and bespoke.

Also use a modern samtools/htslib as these will check for CRC errors in BAM (older ones didn't, neither do some of the other BAM readers out there). If it's a recent tool and not complaining about CRC then the data is probably correct. However frankly if you've only recovered a few Mb from each file that is expected to be Gb then you've basically got nothing of value left. You need to weigh up the value of the data vs the cost of professional recovery services.

ADD COMMENTlink written 2 days ago by jkbonfield70

+1 - very aptly said..

to my surprise, Photorec does understand the BAM/SAM, the specifications are very old though: https://www.cgsecurity.org/wiki/File_Formats_Recovered_By_PhotoRec

ADD REPLYlink written 2 days ago by Santosh Anand4.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 559 users visited in the last hour