Question: picard MarkDuplicates find EOF problem
0
gravatar for victoria_aleks
22 months ago by
victoria_aleks30 wrote:

Dear all, I am running picard MarkDuplicates but getting the error "Exception in thread "main" htsjdk.samtools.FileTruncatedException: Premature end of file:...". And it does not produce any output files. I tried picard's SortSam by coordinates - did not help. Then I added EOF to the .bam file by the home-made script (it worked before on other data), still MarkDuplicates does not work and show the same error message. I know that my .bam files are ok, because I can perform my downstream analysis with no problems and the results seam reasonable. Just one step I can not pass - this duplicates marking... Is there any way to make picard not to pay attention to the fact that .bam files are truncated? :)

(the command I ran: picard MarkDuplicates I=in.bam O=out.bam M=metrics.txt ASSUME_SORTED=true REMOVE_DUPLICATES=false VALIDATION_STRINGENCY=LENIENT. I tried VALIDATION_STRINGENCY=SILECT, too)

I will be so glad to any help! :)

markduplicates eof • 921 views
ADD COMMENTlink written 22 months ago by victoria_aleks30
1

Your BAM is really incomplete/corrupted. You will lose data even if you manage to bypass picard's sanity check. You need to re-download or re-process it.

ADD REPLYlink written 22 months ago by lh331k

can it be that it was produced by the older software version, thats why it has no EOF? or is it not the problem of EOF after all?

ADD REPLYlink modified 22 months ago • written 22 months ago by victoria_aleks30

EOF may or may not be present, but it was not expected in either case (means that the bam file was end prematurely, owing to some corruption)

ADD REPLYlink modified 22 months ago • written 22 months ago by Santosh Anand4.6k

thank you! is there a way to find out what is wrong then? because all the files are of the size I would expect and when I calculate read depth from these files it looks good and normal. and i have 150 these files... so, really really dont want to trash them and start everything from the beginning:)

ADD REPLYlink written 22 months ago by victoria_aleks30
1

Run samtools view -H your.bam > /dev/null on all your bams. If you see "EOF marker is absent" for all your bams, they must have been produced by ancient tools, which would really surprise me nowadays – EOF marker was added over 5 years ago. If some of them yields the warning but others not, they are corrupted files you should fix.

ADD REPLYlink written 22 months ago by lh331k

thank you for your advice! the original .bam files i have downloaded from a cancer database, they are kind of old and theoretically could be processed by the old version. I ran 'bedtools intersect' on these files to produce new "intersected" .bam files. That means that my new .bams will have EOF anyway, because I am using the new software edition, right? or it may depend on the original .bams?

ADD REPLYlink written 22 months ago by victoria_aleks30

There is a difference between EOF not present and premature EOF - in the later case, the EOF was probably present, but it was not expected. It means that the bam may be corrupted.

You may try converting bam -> sam and then sam back to bam, if it resolves the problem.

ADD REPLYlink written 22 months ago by Santosh Anand4.6k
1

well, now i have tried to convert my bam into sam, ant then into bam again, and that way MarkDuplicates worked. I guess, I have to repeat this elabortae procedure for each of files... sad :)

ADD REPLYlink written 22 months ago by victoria_aleks30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1431 users visited in the last hour