Question: picard MarkDuplicates find EOF problem
0
gravatar for victoria_aleks
2.5 years ago by
victoria_aleks30 wrote:

Dear all, I am running picard MarkDuplicates but getting the error "Exception in thread "main" htsjdk.samtools.FileTruncatedException: Premature end of file:...". And it does not produce any output files. I tried picard's SortSam by coordinates - did not help. Then I added EOF to the .bam file by the home-made script (it worked before on other data), still MarkDuplicates does not work and show the same error message. I know that my .bam files are ok, because I can perform my downstream analysis with no problems and the results seam reasonable. Just one step I can not pass - this duplicates marking... Is there any way to make picard not to pay attention to the fact that .bam files are truncated? :)

(the command I ran: picard MarkDuplicates I=in.bam O=out.bam M=metrics.txt ASSUME_SORTED=true REMOVE_DUPLICATES=false VALIDATION_STRINGENCY=LENIENT. I tried VALIDATION_STRINGENCY=SILECT, too)

I will be so glad to any help! :)

markduplicates eof • 1.2k views
ADD COMMENTlink written 2.5 years ago by victoria_aleks30
1

Your BAM is really incomplete/corrupted. You will lose data even if you manage to bypass picard's sanity check. You need to re-download or re-process it.

ADD REPLYlink written 2.5 years ago by lh331k

can it be that it was produced by the older software version, thats why it has no EOF? or is it not the problem of EOF after all?

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by victoria_aleks30

EOF may or may not be present, but it was not expected in either case (means that the bam file was end prematurely, owing to some corruption)

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by Santosh Anand5.0k

thank you! is there a way to find out what is wrong then? because all the files are of the size I would expect and when I calculate read depth from these files it looks good and normal. and i have 150 these files... so, really really dont want to trash them and start everything from the beginning:)

ADD REPLYlink written 2.5 years ago by victoria_aleks30
1

Run samtools view -H your.bam > /dev/null on all your bams. If you see "EOF marker is absent" for all your bams, they must have been produced by ancient tools, which would really surprise me nowadays – EOF marker was added over 5 years ago. If some of them yields the warning but others not, they are corrupted files you should fix.

ADD REPLYlink written 2.5 years ago by lh331k

thank you for your advice! the original .bam files i have downloaded from a cancer database, they are kind of old and theoretically could be processed by the old version. I ran 'bedtools intersect' on these files to produce new "intersected" .bam files. That means that my new .bams will have EOF anyway, because I am using the new software edition, right? or it may depend on the original .bams?

ADD REPLYlink written 2.5 years ago by victoria_aleks30

There is a difference between EOF not present and premature EOF - in the later case, the EOF was probably present, but it was not expected. It means that the bam may be corrupted.

You may try converting bam -> sam and then sam back to bam, if it resolves the problem.

ADD REPLYlink written 2.5 years ago by Santosh Anand5.0k
1

well, now i have tried to convert my bam into sam, ant then into bam again, and that way MarkDuplicates worked. I guess, I have to repeat this elabortae procedure for each of files... sad :)

ADD REPLYlink written 2.5 years ago by victoria_aleks30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1970 users visited in the last hour