I've a strand-specific paired-end library on which I'd like to perform some standard DGE analysis. However I'm quite unclear about how to go about counting reads. Usually when I have a paired-end (unstranded) library, I first clip them for adapters and trim for quality. And when doing these preprocessing, if one of the pairs passes the filtering and the other doesn't, then I retain them as a single end read. Then, I use tophat to first map the PE reads (where both pairs are retained after filtering) and then pass the junctions obtained from that mapping step to run tophat again on the single end reads so as to not loose those otherwise "good" reads.
Now, the way I thought about this is to first do the same procedure to obtain PE reads that pass filtering and retain them. Then, those reads where only 1 of the pair passed filtering will be stored in two separate files as SE reads (depending on which read is retained) as they are strand-specific.
Even if this is good, again, after mapping, while counting the reads, I am unsure how to count the reads.
So my questions are: How do people normally go about mapping a PE library? Do they only keep "properly mapped pairs"? If so, do they count each pair as 1 read (as they are indeed coming from one fragment)? When you also have SE reads in the same bam file, how can you then count the number of reads? And if there are properly paired SS reads and SS SE reads, how does one go about counting the reads per gene?
I know it's a lot of questions, but the essence of it is how to go about counting reads in an unstranded and stranded library when you've got both PE and SE reads within the same bam file... Or is it not a good practice and one should throw away these otherwise good SE reads?
Thank you very much.