Question

bedtools bamtobed of mixed paired-end and non-paired datasets

0

Entering edit mode

5.7 years ago

tassa.saldi • 0

Hello,

I'm wondering if anyone out there can help me. I am trying to calculate mean coverage over exons and introns in my RNAseq dataset using bedtools coverage -mean. I have very large bam files that are actually combined files from two sequencing runs- one single end and one paired-end. I am having a heck of a time making bed files from these bam files. I don't want to loose the second read in a pair (as I understand I would if I just made a regular bed from bamtobed), since the second read is legitimate coverage in my regions. However, if I try to use the -bedpe option the program freaks out because not all the reads in my file are paired (maybe less than half). For this analysis I basically just need a split (on the N cigar) bedfile with intervals for all reads that contribute to coverage in the dataset with the correct strand in the 6th column (which gets complicated when R2 is actually backwards).

Thanks in advance for any suggestions!

RNA-Seq • 1.8k views

ADD COMMENT • link updated 5.7 years ago by drkennetz ▴ 560 • written 5.7 years ago by tassa.saldi • 0

0

Entering edit mode

Hello tassa.saldi,

why have you used bamtobed? AFAIK bedtools coverage except bam as an input file.

fin swimmer

ADD REPLY • link 5.7 years ago by finswimmer 16k

score 0 · Answer 1 · 2018-08-03

Hi tassa,

samtools view -F 0x40 -h merged.bam > R1_from_merged.sam
samtools view -F 0x80 -h merged.bam > R2.sam_from_merged
samtools view -S -b R1.sam > R1_from_merged.bam
samtools view -S -b R2.sam > R2_from_merged.bam

I think you could do the -b flag in the first step to output bam files, but I like the human readable first so I can check the content without having to less the file. Either way should work fine.

Then, you can run bamToBed like you have desired to get final outputs:

bedtools bamtobed -i R1_from_merged.bam > read1_from_merged.bed
bedtools bamtobed -i R2_from_merged.bam > read2_from_merged.bed
bedtools bamtobed -i SE_reads.bam > SE_reads.bed

A note about intervals is that the coordinates for your - strand reads will still be reported from low to high:

(ex: chr 1 1 10,000 name coverage -)

I don't think this would be a problem though.