Hi,
Apologies if it's an obvious question, I couldn't find a definitive answer
My aim is to quantify RNASeq coverage before/after independently determined termination sites
I have aligned my (stranded and paired end) RNSeq data to the reference with Bowtie2. I then split the resulting bam files by strand:
samtools view -bh -F 16 $file".bam" | samtools sort > Split_stranded/$file"_F.bam"
samtools view -bh -f 16 $file".bam" | samtools sort > Split_stranded/$file"_R.bam"
And generated a strand-specific depth table:
~/alice/NICK/Software/samtools/samtools depth -aH $files
My issue is that the apparent 3' termini from the RNASeq don't seem to match the independently determined termini. The transcripts seem to be split, with the 5' end covered by reads on the expected strand (e.g. + strand transcript covered by F reads) and the 3' end covered by reads on the other strand (e.g. + strand transcript covered by R reads).
I have read on several forums that for paired libraries, the two reads in a pair align to opposite strands, so am I then splitting the reads for a single transcript when I split the bam files by read?
How should I get proper coverage for the correct strand? Does read 1 align to the correct strand and read 2 to the opposite?
Should I then separate 4 files based on strand and read, and then merge like this:
- F read 1 + R read 2
- R read 1 and F read 2
Thanks!