Question: Get read start site using bedtools from bam file after merging paired end data
0
gravatar for snishtala03
5 months ago by
snishtala0320
snishtala0320 wrote:

Hello,

I have some paired end viral RNA-Seq data. For some analysis requested by my collaborator, I am trying to do the analysis by merging R1 and R2.

So this is my analysis so far -

  1. I merge read1.fq and read2.fq using bbmerge.
  2. Then, use bwa mem to align reads to the reference viral genome. ( The genome is only 3000 bps long )
  3. Check the flags on my aligned bam file:

    a. 16: read reverse strand

    b. 2048: supplementary alignment

    c.2064: supplementary alignment and read reverse stand

    d. 4: read unmapped

    e. 0: Not sure what this means. Although this discussion answers it to some extent (In Sam Format, Clarify The Meaning Of The "0" Flag.)

  4. Now, to get the start site of each read, I tried 2 methods -

    a. bedtools bamtobed -i bamfile.bam > bedfile.bed

    example: 
    Transcript1      75      196     M02091:32:000000000-C28N4:1:2112:6520:7859      0       +
    Transcript1      75      196     M02091:32:000000000-C28N4:1:2112:13772:7584     0       +
    Transcript1      75      249     M02091:32:000000000-C28N4:1:1102:17223:27428    0       -
    Transcript1      75      196     M02091:32:000000000-C28N4:1:1103:24468:5276     0       -
    

    b. bam2R function from deepSNV package in R.

In both the above cases, I see that reads are aligned to positive and negative strand and the reads mapping to positive strand are the ones with bitwise flag of 0. How can this be possible since I merged the reads?

Thanks a lot for your help!

ADD COMMENTlink modified 5 months ago • written 5 months ago by snishtala0320

I merge read1.fq and read2.fq using bbmerge

Do you expect your reads to merge? That can only happen when the length of sequencing is > (0.5 x insert length). How long are these reads? Why are you not mapping your reads directly to the reference? You could use bbmap.sh.

How can this be possible since I merged the reads?

Merging the reads does not make them oriented on the top strand. You could be merging reads where R1 is from bottom strand. Sequence is always represented 5'-->3'.

ADD REPLYlink modified 5 months ago • written 5 months ago by genomax70k

My read length is around 150 bps, so it is > 0.5*insert length.

I did map the reads as paired end directly to the reference. But my reference is circular. So, I was suggested to merge the reads to see how the alignments look.

ADD REPLYlink written 5 months ago by snishtala0320

Also,

Here is an example of stats when I use bbmerge -

Pairs:                  1071898
Joined:                 1003105     93.582%
Ambiguous:              63915       5.963%
No Solution:            4878        0.455%
Too Short:              0           0.000%

Avg Insert:             165.9
Standard Deviation:     40.2
Mode:                   148

Insert range:           35 - 292
90th percentile:        223
75th percentile:        188
50th percentile:        158
25th percentile:        138
10th percentile:        123
ADD REPLYlink modified 5 months ago • written 5 months ago by snishtala0320
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1615 users visited in the last hour