Question: EOF Marker absent .bam.pbi files
gravatar for nalexandre
6 days ago by
nalexandre0 wrote:

I received a series of subreads.bam.pbi files from my sequencing facility and I unzipped each file. When I try to samtools merge all of the files, I get an error message saying that the EOF marker is absent.

I get the following additional message:

[E::bgzf_read] Read block operation failed with error 4 after 0 of 4 bytes

Files are named like this: m54050R1_180207_203009.subreads.bam.pbi

Filetype is: data

Any direction is helpful.

sequencing assembly • 124 views
ADD COMMENTlink modified 5 days ago by RamRS21k • written 6 days ago by nalexandre0

I used the gunzip command on all pbi files as the file type was gzip compressed data.

ADD REPLYlink written 6 days ago by nalexandre0

The PacBio BAM index file (extension bam.pbi) contains a table of semantic information about each read and its alignment (if applicable).

So I don't thinksamtools merge is going to work with those files since they are a PacBio specific extension.

I assume you have corresponding *.bam files? What are you trying to do?

ADD REPLYlink written 6 days ago by genomax68k

Yes I do! I am just trying to polish my current genome assembly using arrow. I figured I needed to merge the bam files in order to do this sort of polishing. I was planning to index, align, and sort after merging before running arrow.

ADD REPLYlink written 5 days ago by nalexandre0

Even when I run the command with subreads.bam, subreads.bam.pbi, or subreads.bam.pbi.gz, I get the same errors. How would you suggest preparing raw data for polishing a completed genome?

ADD REPLYlink written 5 days ago by nalexandre0
gravatar for h.mon
5 days ago by
h.mon25k wrote:

You didn't explain carefully what your data consists of, so it is difficult to help. From what I gather, you have an (unknown origin) assembly, and want to polish it with the raw PacBio sequencing reads.

Are you trying to merge the .bam.pbi files, or the .bam files? The .bam.pbi files are PacBio BAM index files. In addition to samtools not working on them, what you want to merge are the .bam files, then create a new index for the merged bam with the pbindex program. The BAM recipes wiki has useful information regarding handling of PacBio bam files.

I don't have experience with Arrow and other PacBio tools, but it seems you have to use an aligned (to the assembly you want to polish) bam with Arrow, not the original unaligned bams you have been given by the sequencing center. The docs I've read used BLASR for the alignment step, I don't know if aligning with minimap2 would work with Arrow.

I was planning to index, align, and sort after merging before running arrow.

In case you didn't merge the unaligned bams, you can first map each subread bam separately, sort each of them, then merge the bam, which will result in a sorted merged bam. After merging, you can index the bam, and use this bam with Arrow.

ADD COMMENTlink written 5 days ago by h.mon25k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 547 users visited in the last hour