I referenced this post How To Extract Reads-Pairs Aligned Concordantly Exactly 1 Time? on how to take tophat output from paired end RNAseq data (accepted_hits.bam) and use samtools to make a new bam file that included only the concordant mapped read pairs (let's call it accepted_hits_concordant.bam). However, a simple line count of both files suggests that the accepted_hits_concordant.bam file is much larger than accepted_hits.bam. I would have expected accepted_hits_concordant.bam to have fewer lines than accepted_hits.bam because it should be keeping fewer read pairs. Maybe I entered the wrong commands. Here's what I did:
samtools view -f 0x2 accepted_hits.bam >accepted_hits_conc.bam
wc -l accepted_hits.bam #13722228
wc -l accepted_hits_conc.bam #52945812
If I entered my commands correctly, what then could account for all the additional lines in accepted_hits_conc.bam?