Question: How to split a bed file according promoter window (-30 to 300bp)
1
gravatar for Lila M
2.7 years ago by
Lila M 800
UK
Lila M 800 wrote:

Hi everybody,

I have a bed file of interest that I want to compare with a bigwig file to get the coverage. To do that, I am using deepTools

 multiBigwigSummary BED-file -b Norm -out promoter_coverage.zip --BED file.BED --outRawCounts coverage

I want to calculate the coverage over the promoter region (-30 bp to 300bp arround TSS) and I was thinking in the best way to do that. First I thought that it could be possible address this issue editing the bed file as follow

chr   start  end         chr   start   end (= start +300)
chr1  14362  29370  ---> chr1  14362  14662

And for the gen body (if I want a window from 300 bp to the end) :

chr   start  end         chr   start (+300)  end 
chr1  14362  29370  ---> chr1     1736     29370

Have this approach any sense? or does anybody knows another better?

Thank you!

ADD COMMENTlink written 2.7 years ago by Lila M 800
1

Please take a look at this thread. Specifically Alex Reynold's comment. It looks like you're attempting to calculate promoter pausing indexes, or similarly.

You would need to generate a coverage profile for promoters and then for gene bodies and then perform the appropriate calculations. Generating the coverage profiles can be done using the bedops program using Alex's approach.

ADD REPLYlink written 2.7 years ago by Sinji2.9k

I appreciate if you could specify a bit more your answer and also respond to my question, because maybe both of them are correct, the difference is that bedops is totally new for me and I would like to know if my approach is correct.

Thank you

ADD REPLYlink written 2.7 years ago by Lila M 800
2

In general yes your approach works. You would want to subtract -30 from the start coordinate for column 2, and then add 300 to the start coordinate for column 3. You would do the opposite for genes on the - strand.

Something like:

awk -v OFS='\t' '{if ($6 == "+") print $1, $2-30, $2+300, $4, $5, $6; else print $1, $3-30, $3+300, $4, $5, $6}' INFILE > OUTFILE

For the gene body you would simply add 300 bp to the start coordinate for the second column and keep the third column the way it is, and do the opposite for the genes in the - direction.

So yes. Your approach works. Just make sure that you're subtracting and adding bp in the appropriate direction.

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by Sinji2.9k
1

Thank you very much :) anyway, I'm going to have a look to bedops!!

ADD REPLYlink written 2.7 years ago by Lila M 800
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 3566 users visited in the last hour