I have some paired end viral RNA-Seq data. For some analysis requested by my collaborator, I am trying to do the analysis by merging R1 and R2.
So this is my analysis so far -
- I merge read1.fq and read2.fq using bbmerge.
- Then, use bwa mem to align reads to the reference viral genome. ( The genome is only 3000 bps long )
Check the flags on my aligned bam file:
a. 16: read reverse strand
b. 2048: supplementary alignment
c.2064: supplementary alignment and read reverse stand
d. 4: read unmapped
e. 0: Not sure what this means. Although this discussion answers it to some extent (In Sam Format, Clarify The Meaning Of The "0" Flag.)
Now, to get the start site of each read, I tried 2 methods -
a. bedtools bamtobed -i bamfile.bam > bedfile.bed
example: Transcript1 75 196 M02091:32:000000000-C28N4:1:2112:6520:7859 0 + Transcript1 75 196 M02091:32:000000000-C28N4:1:2112:13772:7584 0 + Transcript1 75 249 M02091:32:000000000-C28N4:1:1102:17223:27428 0 - Transcript1 75 196 M02091:32:000000000-C28N4:1:1103:24468:5276 0 -
b. bam2R function from deepSNV package in R.
In both the above cases, I see that reads are aligned to positive and negative strand and the reads mapping to positive strand are the ones with bitwise flag of 0. How can this be possible since I merged the reads?
Thanks a lot for your help!