Question: bedtools bamtobed of mixed paired-end and non-paired datasets
0
gravatar for tassa.saldi
8 months ago by
tassa.saldi0 wrote:

Hello,

I'm wondering if anyone out there can help me. I am trying to calculate mean coverage over exons and introns in my RNAseq dataset using bedtools coverage -mean. I have very large bam files that are actually combined files from two sequencing runs- one single end and one paired-end. I am having a heck of a time making bed files from these bam files. I don't want to loose the second read in a pair (as I understand I would if I just made a regular bed from bamtobed), since the second read is legitimate coverage in my regions. However, if I try to use the -bedpe option the program freaks out because not all the reads in my file are paired (maybe less than half). For this analysis I basically just need a split (on the N cigar) bedfile with intervals for all reads that contribute to coverage in the dataset with the correct strand in the 6th column (which gets complicated when R2 is actually backwards).

Thanks in advance for any suggestions!

rna-seq • 324 views
ADD COMMENTlink modified 7 months ago by drkennetz360 • written 8 months ago by tassa.saldi0

Hello tassa.saldi,

why have you used bamtobed? AFAIK bedtools coverage except bam as an input file.

fin swimmer

ADD REPLYlink modified 7 months ago • written 7 months ago by finswimmer11k
0
gravatar for drkennetz
7 months ago by
drkennetz360
drkennetz360 wrote:

Hi tassa,

samtools view -F 0x40 -h merged.bam > R1_from_merged.sam
samtools view -F 0x80 -h merged.bam > R2.sam_from_merged
samtools view -S -b R1.sam > R1_from_merged.bam
samtools view -S -b R2.sam > R2_from_merged.bam

I think you could do the -b flag in the first step to output bam files, but I like the human readable first so I can check the content without having to less the file. Either way should work fine.

Then, you can run bamToBed like you have desired to get final outputs:

bedtools bamtobed -i R1_from_merged.bam > read1_from_merged.bed
bedtools bamtobed -i R2_from_merged.bam > read2_from_merged.bed
bedtools bamtobed -i SE_reads.bam > SE_reads.bed

A note about intervals is that the coordinates for your - strand reads will still be reported from low to high:

(ex: chr 1 1 10,000 name coverage -)

I don't think this would be a problem though.

ADD COMMENTlink written 7 months ago by drkennetz360
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2408 users visited in the last hour