Extract strand information from ENSEMBL
1
0
Entering edit mode
5.3 years ago
Gene_MMP8 ▴ 240

I have a set of variants in the following format

BRAF_7_140453150_A/T  
BRAF_7_140453145_A/T  
BRAF_7_140453145_A/C  
BRAF_7_140453136_A/T  
BRAF_7_140481417_C/A

I want to extract flanking bases for each of this mutation. How do I get the strand information form the above data? I need to create a bed file with the format Chromosome start end strand.

gene SNP ensembl • 886 views
ADD COMMENT
3
Entering edit mode
5.3 years ago

Hello,

I see no reason, why you need the strand information. If I'm correct these list of variants contains all information to create a valid bed file. Assuming that the format is <gene>_<chromosome>_<pos>_<REF>/<ALT> one can use awk:

$ awk -v FS="_" -v OFS="\t" '{print $2, $3-1, $3, $0}' variants.txt > output.bed

If you want to expand the region for e.g. 50bp to the left and right, do this:

$ awk -v FS="_" -v OFS="\t" '{print $2, $3-51, $3+50, $0}' variants.txt > output.bed

fin swimmer

ADD COMMENT
0
Entering edit mode

Thanks a lot for your reply. How do I add"chr" to each chromosome number using awk? All my hg19 chromosome numbers are of the form "chr1" etc

ADD REPLY
0
Entering edit mode

Just add it to the print statement:

print "chr"$2, $3-1, $3, $0
ADD REPLY
0
Entering edit mode

Got it. Thanks a lot

ADD REPLY

Login before adding your answer.

Traffic: 1943 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6