I'm working with SN RNA-seq data that I processed using Cellranger v. 7.1.0 (which uses STAR for alignment with default settings).
Using the samtools flagstat command to look at a BAM file output:
284890368 + 0 in total (QC-passed reads + QC-failed reads) 0 + 0 secondary 0 + 0 supplementary 141030430 + 0 duplicates 277071629 + 0 mapped (97.26% : N/A) 0 + 0 paired in sequencing 0 + 0 read1 0 + 0 read2 0 + 0 properly paired (N/A : N/A) 0 + 0 with itself and mate mapped 0 + 0 singletons (N/A : N/A) 0 + 0 with mate mapped to a different chr 0 + 0 with mate mapped to a different chr (mapQ>=5)
Which looks off... shouldn't there be reads annotated as read1 or read1, properly paired or not, etc.?
Looking at the unique samtools flags in this particular file, I find:
flag occurrences 0 62497243 16 59602309 1024 73268831 1040 67761599
... which correspond to 'unpaired+forward', 'reverse', 'duplicate', and 'reverse+duplicate', respectively.
My goal is to filter for uniquely mapping, properly paired reads, but it's not clear to me how this can be done given these outputs.
Any advice or insight would be much appreciated.
So I'm guessing you'll need a flag to pick for samtools view
How does the flagstat output affect the flag you pick for
The flagstat output doesn't necessarily impact those flags, I just wanted to know what I was working with in terms of properly paired reads, and was initially confused by the finding that there were none. And, according to my search of the BAM file, none of the flags indicated any of the reads were properly paired, hence my confusion about how to extract these data.
However, per Genomax's comment, this does not apply to 10x SC/SN RNA-seq data, so my question was misguided from the start.
Your question was not misguided, we all miss "obvious" differences at times owing to the fact that our expectations are set based on the majority of our experience. BTW, I've moved GenoMax's comment to an answer. Please accept it to mark the post as solved.