Concordant pairs bam file larger than accepted_hits bam file?
1
0
Entering edit mode
7.0 years ago
bmpbowen ▴ 40

I referenced this post How To Extract Reads-Pairs Aligned Concordantly Exactly 1 Time? on how to take tophat output from paired end RNAseq data (accepted_hits.bam) and use samtools to make a new bam file that included only the concordant mapped read pairs (let's call it accepted_hits_concordant.bam). However, a simple line count of both files suggests that the accepted_hits_concordant.bam file is much larger than accepted_hits.bam. I would have expected accepted_hits_concordant.bam to have fewer lines than accepted_hits.bam because it should be keeping fewer read pairs. Maybe I entered the wrong commands. Here's what I did:

samtools view -f 0x2 accepted_hits.bam >accepted_hits_conc.bam

wc -l accepted_hits.bam #13722228

wc -l accepted_hits_conc.bam #52945812

If I entered my commands correctly, what then could account for all the additional lines in accepted_hits_conc.bam?

 

Thanks!

Tophat samtools paired concordant alignment • 2.7k views
ADD COMMENT
4
Entering edit mode
7.0 years ago

wc -l should not be used on binary files. Instead use

samtools view -c accepted_hits.bam

or

samtools view accepted_hits.bam | wc -l

You can find more filtering options here:

http://broadinstitute.github.io/picard/explain-flags.html

The command

samtools view -f 0x2 accepted_hits.bam > accepted_hits_conc.bam

creates a SAM file instead of bam file, without headers. Hence, the `accepted_hits.bam` is a binary file and `accepted_hits_conc.bam` is a SAM file.

The correct command would be:

samtools view -b -f 0x2 accepted_hits.bam > accepted_hits_conc.bam [or]
samtools view -b -f 0x2 accepted_hits.bam -o accepted_hits_conc.bam

 

ADD COMMENT
0
Entering edit mode

Ok, when I do it that way, I get line count for accepted_hits.bam but the following error for accepted_hits_conc.bam:

[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
[main_samview] fail to read the header from "accepted_hits_conc.bam".
0

So I think the above samtools command (samtools view -f 0x2 accepted_hits.bam >accepted_hits_conc.bam) for extracting concordant aligned pairs into their own bam file didn't work...

I can view counts of accepted hits and concordant hits if I do samtools view -c accepted_hits.bam and samtools view -c -f 0x2 accepted_hits.bam,  but I don't want to just get a count of how many read pairs are concordant, I want to make a new bam file consisting only of concordant read pairs...

ADD REPLY
0
Entering edit mode

Updated the answer.

ADD REPLY
0
Entering edit mode

Great that did it! Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 2639 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6