Hello! I analyzing paired-end RNA-seq reads made from a strand-specific dUTP Illumina library construction kit (Read2 corresponding to the original transcript orientation). These reads are from one organism that has a reference genome, and I am hoping to perform differential expression analyses with HT-seq. I have aligned my strand-specific reads to the genome with Bowtie2 (using the
--fr --nofw flags), and now I need to pull the appropriate alignments out of the SAM output file.
I've read Istvan Albert's great post on using SAMtools to split strand-specific data, and looked at how SAM flags are visualized, but I am still lost. My original thought was to output SAM flag -3 (read paired, read mapped in proper pair), but does this not take in to consideration strand-specificity? What are the appropriate flags to pull out proper mate-pairs of reads that also correspond to the strand that it originated from? Is this extremely important for downstream HT-seq counts to get differential expression?