I referenced this post on how to take tophat output from paired end RNAseq data (accepted_hits.bam
) and use samtools to make a new bam file that included only the concordant mapped read pairs (let's call it accepted_hits_concordant.bam
). However, a simple line count of both files suggests that the accepted_hits_concordant.bam
file is much larger than accepted_hits.bam
. I would have expected accepted_hits_concordant.bam
to have fewer lines than accepted_hits.bam
because it should be keeping fewer read pairs. Maybe I entered the wrong commands. Here's what I did:
samtools view -f 0x2 accepted_hits.bam >accepted_hits_conc.bam
wc -l accepted_hits.bam #13722228
wc -l accepted_hits_conc.bam #52945812
If I entered my commands correctly, what then could account for all the additional lines in accepted_hits_conc.bam
?
Thanks!
Ok, when I do it that way, I get line count for
accepted_hits.bam
but the following error foraccepted_hits_conc.bam
:So I think the above samtools command (
samtools view -f 0x2 accepted_hits.bam >accepted_hits_conc.bam
) for extracting concordant aligned pairs into their own bam file didn't work...I can view counts of accepted hits and concordant hits if I do
samtools view -c accepted_hits.bam
andsamtools view -c -f 0x2 accepted_hits.bam
, but I don't want to just get a count of how many read pairs are concordant, I want to make a new bam file consisting only of concordant read pairs...Updated the answer.
Great that did it! Thanks!