Question

Get read start site using bedtools from bam file after merging paired end data

0

Entering edit mode

5.1 years ago

snishtala03 ▴ 70

Hello,

I have some paired end viral RNA-Seq data. For some analysis requested by my collaborator, I am trying to do the analysis by merging R1 and R2.

So this is my analysis so far -

I merge read1.fq and read2.fq using bbmerge.
Then, use bwa mem to align reads to the reference viral genome. ( The genome is only 3000 bps long )
Check the flags on my aligned bam file:

a. 16: read reverse strand

b. 2048: supplementary alignment

c.2064: supplementary alignment and read reverse stand

d. 4: read unmapped

e. 0: Not sure what this means. Although this discussion answers it to some extent (In Sam Format, Clarify The Meaning Of The "0" Flag.)

Now, to get the start site of each read, I tried 2 methods -

a. bedtools bamtobed -i bamfile.bam > bedfile.bed

example: 
Transcript1      75      196     M02091:32:000000000-C28N4:1:2112:6520:7859      0       +
Transcript1      75      196     M02091:32:000000000-C28N4:1:2112:13772:7584     0       +
Transcript1      75      249     M02091:32:000000000-C28N4:1:1102:17223:27428    0       -
Transcript1      75      196     M02091:32:000000000-C28N4:1:1103:24468:5276     0       -

b. bam2R function from deepSNV package in R.

In both the above cases, I see that reads are aligned to positive and negative strand and the reads mapping to positive strand are the ones with bitwise flag of 0. How can this be possible since I merged the reads?

Thanks a lot for your help!

RNA-Seq bbmerge bedtools samtools alignment • 1.3k views

ADD COMMENT • link 5.1 years ago by snishtala03 ▴ 70

0

Entering edit mode

I merge read1.fq and read2.fq using bbmerge

Do you expect your reads to merge? That can only happen when the length of sequencing is > (0.5 x insert length). How long are these reads? Why are you not mapping your reads directly to the reference? You could use bbmap.sh.

How can this be possible since I merged the reads?

Merging the reads does not make them oriented on the top strand. You could be merging reads where R1 is from bottom strand. Sequence is always represented 5'-->3'.

ADD REPLY • link 5.1 years ago by GenoMax 141k

0

Entering edit mode

My read length is around 150 bps, so it is > 0.5*insert length.

I did map the reads as paired end directly to the reference. But my reference is circular. So, I was suggested to merge the reads to see how the alignments look.

ADD REPLY • link 5.1 years ago by snishtala03 ▴ 70

0

Entering edit mode

Also,

Here is an example of stats when I use bbmerge -

Pairs:                  1071898
Joined:                 1003105     93.582%
Ambiguous:              63915       5.963%
No Solution:            4878        0.455%
Too Short:              0           0.000%

Avg Insert:             165.9
Standard Deviation:     40.2
Mode:                   148

Insert range:           35 - 292
90th percentile:        223
75th percentile:        188
50th percentile:        158
25th percentile:        138
10th percentile:        123

ADD REPLY • link 5.1 years ago by snishtala03 ▴ 70