Question: Extending and clipping bed files
2
gravatar for ApoorvaB
2.5 years ago by
ApoorvaB200
United States
ApoorvaB200 wrote:

Hi Everyone,

I have the data from a ChIP-Seq experiment. I used EDD to call the domains. I am interested in knowing out what genes are present just a few bp outside of the domains. I extended each of the peaks in the bed file I got from edd by 250 bp on both ends using bedtools slop. But what I need now is two rows for each feature of length 500bp centered around the ends. For instance,

Input

chr1    1000  2000
chr1     2500 3500

Output

chr1   750    1250
chr1   1750   2250
chr1    2250  2750
chr1    3250  3750

Using bedtools slop to extend and subtracting the old bed file from new file with extended coordinates will give me the 250bp region exactly outside the two ends of the peak. But I want the ends of the peak to be in the centre. How can I do that ?

Thanks

chip-seq bedtools • 1.0k views
ADD COMMENTlink modified 2.5 years ago by Brice Sarver2.6k • written 2.5 years ago by ApoorvaB200
3
gravatar for Brice Sarver
2.5 years ago by
Brice Sarver2.6k
United States
Brice Sarver2.6k wrote:

A bed file is just a table, so you can do this quickly in R or the language of your choice.

shiftPosition <- function(record, size){
first <- c(record[1], as.numeric(record[2]) - size, as.numeric(record[2]) + size)
second <- c(record[1], as.numeric(record[3]) - size, as.numeric(record[3]) + size)
result <- c(first, second)
return(result)
}

bed <- read.table("your_bed_file.bed", header=FALSE, stringsAsFactors=FALSE)

final <- data.frame(matrix(apply(bed, 1, shiftPosition, 250), ncol=3, byrow=TRUE))

write.table(final, file="your_new_bed.bed", quote=FALSE, row.names=FALSE, col.names=FALSE)

This converts

    V1      V2      V3
1 chr1 3680371 3681212
2 chr1 4344252 4350391
3 chr1 4351710 4353125
4 chr1 4491516 4493706
5 chr1 4773006 4777948

to

1  chr1 3680121 3680621
2  chr1 3680962 3681462
3  chr1 4344002 4344502
4  chr1 4350141 4350641
5  chr1 4351460 4351960
6  chr1 4352875 4353375
7  chr1 4491266 4491766
8  chr1 4493456 4493956
9  chr1 4772756 4773256
10 chr1 4777698 4778198
ADD COMMENTlink written 2.5 years ago by Brice Sarver2.6k
2
gravatar for Pierre Lindenbaum
2.5 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum119k wrote:

I'm not sure I understand , but if your run the following awk;

 awk -F '\t' '{printf("%s\t%s\t%s\n%s\t%s\t%s\n",$1,$2,$2,$1,$3,$3);}' input.bed

and pipe the output into bedtools slop, you should get the desired output.

ADD COMMENTlink written 2.5 years ago by Pierre Lindenbaum119k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 649 users visited in the last hour