Hello,
I have some paired end viral RNA-Seq data. For some analysis requested by my collaborator, I am trying to do the analysis by merging R1 and R2.
So this is my analysis so far -
- I merge read1.fq and read2.fq using bbmerge.
- Then, use bwa mem to align reads to the reference viral genome. ( The genome is only 3000 bps long )
Check the flags on my aligned bam file:
a. 16: read reverse strand
b. 2048: supplementary alignment
c.2064: supplementary alignment and read reverse stand
d. 4: read unmapped
e. 0: Not sure what this means. Although this discussion answers it to some extent (In Sam Format, Clarify The Meaning Of The "0" Flag.)
Now, to get the start site of each read, I tried 2 methods -
a. bedtools bamtobed -i bamfile.bam > bedfile.bed
example: Transcript1 75 196 M02091:32:000000000-C28N4:1:2112:6520:7859 0 + Transcript1 75 196 M02091:32:000000000-C28N4:1:2112:13772:7584 0 + Transcript1 75 249 M02091:32:000000000-C28N4:1:1102:17223:27428 0 - Transcript1 75 196 M02091:32:000000000-C28N4:1:1103:24468:5276 0 -
b. bam2R function from deepSNV package in R.
In both the above cases, I see that reads are aligned to positive and negative strand and the reads mapping to positive strand are the ones with bitwise flag of 0. How can this be possible since I merged the reads?
Thanks a lot for your help!
Do you expect your reads to merge? That can only happen when the length of sequencing is > (0.5 x insert length). How long are these reads? Why are you not mapping your reads directly to the reference? You could use
bbmap.sh
.Merging the reads does not make them oriented on the top strand. You could be merging reads where R1 is from bottom strand. Sequence is always represented 5'-->3'.
My read length is around 150 bps, so it is > 0.5*insert length.
I did map the reads as paired end directly to the reference. But my reference is circular. So, I was suggested to merge the reads to see how the alignments look.
Also,
Here is an example of stats when I use bbmerge -