Question: bedtools bamtobed of mixed paired-end and non-paired datasets
gravatar for tassa.saldi
2.6 years ago by
tassa.saldi0 wrote:


I'm wondering if anyone out there can help me. I am trying to calculate mean coverage over exons and introns in my RNAseq dataset using bedtools coverage -mean. I have very large bam files that are actually combined files from two sequencing runs- one single end and one paired-end. I am having a heck of a time making bed files from these bam files. I don't want to loose the second read in a pair (as I understand I would if I just made a regular bed from bamtobed), since the second read is legitimate coverage in my regions. However, if I try to use the -bedpe option the program freaks out because not all the reads in my file are paired (maybe less than half). For this analysis I basically just need a split (on the N cigar) bedfile with intervals for all reads that contribute to coverage in the dataset with the correct strand in the 6th column (which gets complicated when R2 is actually backwards).

Thanks in advance for any suggestions!

rna-seq • 954 views
ADD COMMENTlink modified 2.6 years ago by drkennetz510 • written 2.6 years ago by tassa.saldi0

Hello tassa.saldi,

why have you used bamtobed? AFAIK bedtools coverage except bam as an input file.

fin swimmer

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by finswimmer14k
gravatar for drkennetz
2.6 years ago by
drkennetz510 wrote:

Hi tassa,

samtools view -F 0x40 -h merged.bam > R1_from_merged.sam
samtools view -F 0x80 -h merged.bam > R2.sam_from_merged
samtools view -S -b R1.sam > R1_from_merged.bam
samtools view -S -b R2.sam > R2_from_merged.bam

I think you could do the -b flag in the first step to output bam files, but I like the human readable first so I can check the content without having to less the file. Either way should work fine.

Then, you can run bamToBed like you have desired to get final outputs:

bedtools bamtobed -i R1_from_merged.bam > read1_from_merged.bed
bedtools bamtobed -i R2_from_merged.bam > read2_from_merged.bed
bedtools bamtobed -i SE_reads.bam > SE_reads.bed

A note about intervals is that the coordinates for your - strand reads will still be reported from low to high:

(ex: chr 1 1 10,000 name coverage -)

I don't think this would be a problem though.

ADD COMMENTlink written 2.6 years ago by drkennetz510
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1017 users visited in the last hour