Question: Error sorted paired-end .bam using samtools
1
gravatar for holbrookjoanna
2.3 years ago by
holbrookjoanna20 wrote:

Hi

I created some .bam files aligning reads to the human genome using ernebs5 http://erne.sourceforge.net/manual.php

I had both paired-end and singletons reads which I aligned seperately I have had no problem manipulating the singleon bam files in samtool However, my paired-end read files won't sort. Here is the flagstat for one:

117254947 + 616693 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
103481375 + 0 mapped (88.25% : 0.00%)
117254947 + 616693 paired in sequencing
58925941 + 9879 read1
58329006 + 606814 read2
264604 + 0 properly paired (0.23% : 0.00%)
97335658 + 0 with itself and mate mapped
6145717 + 0 singletons (5.24% : 0.00%)
2896848 + 0 with mate mapped to a different chr
1538594 + 0 with mate mapped to a different chr (mapQ>=5)

I can convert it to a .sam file and to my (inexperienced) eye it looks fine and similar to the singletons alignments. However, when I try to sort I get an error, that the chromosome labels are found I the binary header but not the text header? I don't understand this and why it did not affect the singleton alignments (aligned against the same reference)

[ bam_sort_core] merging from 85 files...

[E::trans_tbl_add_sq] @SQ SN (chr1) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr10) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr11) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr11_gl000202_random) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr12) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr13) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr14) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr15) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr16) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr17) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr17_ctg5_hap1) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr17_gl000203_random) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr17_gl000204_random) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr17_gl000205_random) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr17_gl000206_random) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr18) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr18_gl000207_random) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr19) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr19_gl000208_random) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr19_gl000209_random) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr1_gl000191_random) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr1_gl000192_random) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr2) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr20) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr21) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr21_gl000210_random) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr22) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr3) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr4) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr4_ctg9_hap1) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr4_gl000193_random) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr4_gl000194_random) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr5) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr6) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr6_apd_hap1) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr6_cox_hap2) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr6_dbb_hap3) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr6_mann_hap4) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr6_mcf_hap5) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr6_qbl_hap6) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr6_ssto_hap7) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr7) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr7_gl000195_random) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr8) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr8_gl000196_random) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr8_gl000197_random) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr9) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr9_gl000198_random) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr9_gl000199_random) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr9_gl000200_random) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr9_gl000201_random) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrM) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000211) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000212) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000213) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000214) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000215) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000216) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000217) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000218) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000219) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000220) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000221) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000222) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000223) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000224) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000225) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000226) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000227) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000228) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000229) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000230) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000231) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000232) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000233) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000234) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000235) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000236) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000237) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000238) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000239) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000240) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000241) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000242) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000243) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000244) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000245) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000246) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000247) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000248) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000249) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrX) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrY) found in binary header but not text header.

I get this using the -n flag in sort or without i.e. sort -n Sample5c_paired.bam > sorted.bam

Am I missing something very obvious about sorting paired-end .bam files in samtools? or is there something round with my alignment file? Any help very gratefully appreciated.

Jo

bam samtools erne paired-end • 1.0k views
ADD COMMENTlink modified 2.3 years ago by John12k • written 2.3 years ago by holbrookjoanna20

Thanks genomax2 and MacSpider

Sorry not to be clear. I am using samtools sort to sort by .bam files. Its weird I don't have the same issue with the singleton reads even though its got the same headers. Anyway, I will try the new samtools version (and now I think I can just delete the @SQ SN headers and keep going if necessary)

Thanks!!

ADD REPLYlink written 2.3 years ago by holbrookjoanna20

Hi again

OI am using samtools 1.3.1 which seem to be the latest? Jo

ADD REPLYlink written 2.3 years ago by holbrookjoanna20

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized.

ADD REPLYlink written 2.3 years ago by genomax65k
3
gravatar for John
2.3 years ago by
John12k
Germany
John12k wrote:

The BAM format has two headers. One in binary, and one in text. The binary header only contains chromosome names/lengths. Sometimes these two headers come out-of-sync.

To debug you can try and use pybam:

Download pybam from https://github.com/JohnLonginotto/pybam/blob/master/pybam.py Whatever directory that downloads to, go there in the terminal and run python (2.x), and in the python terminal type:

import pybam
bam_data = pybam.bgunzip('/path/to/your.bam') # change the path :)
print bam_data.header_text                    # The text header
print bam_data.chromosome_names               # From the binary header
print bam_data.chromosome_lengths             # From the binary header

If they don't match, you'll need to reheader the BAM file, which is a very error-prone process depending on what the above result gives you (i.e. if you can just alter the text portion of the header, or if you need to write a new binary bit).

ADD COMMENTlink modified 2.3 years ago • written 2.3 years ago by John12k
2
gravatar for genomax
2.3 years ago by
genomax65k
United States
genomax65k wrote:

Are you using the latest samtools? If not that would be the first thing to try. This error has been referenced in this thread and appears to have been fixed.

If you sorted your files with unix sort then follow @Macspider's suggestion below.

ADD COMMENTlink modified 2.3 years ago • written 2.3 years ago by genomax65k

Hi I am using samtools 1.3.1

ADD REPLYlink written 2.3 years ago by holbrookjoanna20
0
gravatar for Macspider
2.3 years ago by
Macspider2.8k
Vienna - BOKU
Macspider2.8k wrote:

found in binary header but not text header

You are trying to sort a bam file, which is the binary -machine readable- sam. To do so, you should take care of the header. Take a moment to check the difference between samtools view file.bam and samtools view -h file.bam. You will notice that the header appears.

What you should do in these cases to sort it is to use, for practical reasons, samtools sort. Which also allows you to order either by position or by name. bamtools has also a sort sub-command, if you prefer it to samtools.

Sorting with the normal built-in sort won't work on the bam file! This because of the header that is composed of lines starting with @ therefore generating a corrupted sam file.

ADD COMMENTlink written 2.3 years ago by Macspider2.8k

I am using samtools sort not unix sort

ADD REPLYlink written 2.3 years ago by holbrookjoanna20
0
gravatar for holbrookjoanna
2.3 years ago by
holbrookjoanna20 wrote:

Hi

I am still stuck on this.

It does not happen when I sort a singleton reads file also created by ernebs5 and with identical @SQ headers. It is not due to memory constraints as I have diverted the temp files to a big enough repository and checked the usage in our HPC facility.

I'd still be graeteful for any more ideas

Jo

ADD COMMENTlink written 2.3 years ago by holbrookjoanna20

I suggest that you post this question to Samtools mailing list to make them aware of the problem: https://lists.sourceforge.net/lists/listinfo/samtools-help

I will tag John Marshall who is the official maintainer of Samtools.

ADD REPLYlink written 2.3 years ago by genomax65k

Thank-you genomax2, I have done as you suggested.

ADD REPLYlink written 2.3 years ago by holbrookjoanna20
1

Hi

Just to close this loop

The problem was Erne does not output @SQ headers. It looks that way when I use samtools view because samtools automatically generates basic ones. No @SQ headers causes samtools sort to fail, but only when the alignments a big enough to need to merge (hence the singletons worked but the paired did not) Samtools 1.3.1 does not have this problem, but I was running two versions of samtools on my HPC and samtools 1.3 was failing the files.

Thanks very much for the help and the first suggestion to update samtools was right!

ADD REPLYlink written 2.3 years ago by holbrookjoanna20

If Erne is not outputting SQ headers, that's frankly a very worrying sign. While you have managed to work around the issue by adding in SQ headers, it makes me extremely suspicious of Erne as an aligner to begin with. I hadn't heard of it until now, and if it doesn't write out spec-compliant BAM data, that's a serious issue. I'd consider using a more mainstream aligner if possible...

At the very least, run your BAM file through Picard's ValidateSamFile to make sure there aren't more surprises waiting for you :)

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by John12k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 832 users visited in the last hour