I referenced this post on how to take tophat output from paired end RNAseq data (accepted_hits.bam) and use samtools to make a new bam file that included only the concordant mapped read pairs (let's call it accepted_hits_concordant.bam). However, a simple line count of both files suggests that the accepted_hits_concordant.bam file is much larger than accepted_hits.bam. I would have expected accepted_hits_concordant.bam to have fewer lines than accepted_hits.bam because it should be keeping fewer read pairs. Maybe I entered the wrong commands. Here's what I did:
samtools view -f 0x2 accepted_hits.bam >accepted_hits_conc.bam
wc -l accepted_hits.bam #13722228
wc -l accepted_hits_conc.bam #52945812
If I entered my commands correctly, what then could account for all the additional lines in accepted_hits_conc.bam?
Thanks!
Ok, when I do it that way, I get line count for
accepted_hits.bambut the following error foraccepted_hits_conc.bam:So I think the above samtools command (
samtools view -f 0x2 accepted_hits.bam >accepted_hits_conc.bam) for extracting concordant aligned pairs into their own bam file didn't work...I can view counts of accepted hits and concordant hits if I do
samtools view -c accepted_hits.bamandsamtools view -c -f 0x2 accepted_hits.bam, but I don't want to just get a count of how many read pairs are concordant, I want to make a new bam file consisting only of concordant read pairs...Updated the answer.
Great that did it! Thanks!