Creating Custom Bed File For Refgene (Beginning 500 Bp Upstream And Extending 2000 Bp Into The Gene)
2
1
Entering edit mode
11.1 years ago

I want to pick out the parts of the rn4 genome beginning 500 bp before a gene and extending 2000 bp into the gene.

Is this matter as simple as choosing the option

Create one BED record per: Whole Gene

and then changing the ouput by deducting 500 from the start position and changing the end position to start position + 2000?

E.g. changing the following BED line

chr1 100000 100500 NM_019248_up_500_chr1_134302140_r 0 -

into

chr1 99500 102000 NM_019248_up_500_chr1_134302140_r 0 -

?

Edit: if the gene is shorter than 2000 bp I should not change the end position.

ucsc • 3.2k views
ADD COMMENT
0
Entering edit mode

hi, I took the liberty of changing your example into one that is easier to read. This way, you will get more answers.

ADD REPLY
0
Entering edit mode

seems like your example doesn't match your words. if it extends 2Kb into the stram, shouldn't the 2nd number be 97500? and if you're using the strand information, then you'd want to use 200000 - 2000 to 2000000 + 500.

ADD REPLY
0
Entering edit mode

sorry brent, it was my fault, I've edited the question and made an error. It should be fixed now.

ADD REPLY
3
Entering edit mode
11.1 years ago

Just editing Giovanni's answer (because),

  • there are only 6 columns so $7 not required
  • tabs should be inserted
  • and there is a need for strand specificity
  • plus we need to return the genes as it is, if gene length < 2000

modified script:

awk '{if ($3-$2>2000){if($6 =="+") print $1"\t"$2-500"\t"$2+2000"\t"$4"\t"$5"\t"$6; else print $1"\t"$2-2000"\t"$2+500"\t"$4"\t"$5"\t"$6} else print $0}' file.bed > out.bed

You need to note here, in case of -ve strand, the gene end (downstream) will be smaller, so I turned it up, to make it in compliance with the bed format and can be used with tools like bedtools.

Cheers

ADD COMMENT
0
Entering edit mode

All answers appreciated. Thanks.

ADD REPLY
1
Entering edit mode
11.1 years ago

You can do it quite easily with awk.

awk '{print $1 $2-500 $2+2000 $4 $5 $6 $7}' > outputfile.txt
ADD COMMENT

Login before adding your answer.

Traffic: 1722 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6