Question: Extract strand information from ENSEMBL
0
gravatar for banerjeeshayantan
24 months ago by
banerjeeshayantan190 wrote:

I have a set of variants in the following format

BRAF_7_140453150_A/T  
BRAF_7_140453145_A/T  
BRAF_7_140453145_A/C  
BRAF_7_140453136_A/T  
BRAF_7_140481417_C/A

I want to extract flanking bases for each of this mutation. How do I get the strand information form the above data? I need to create a bed file with the format Chromosome start end strand.

snp ensembl gene • 353 views
ADD COMMENTlink modified 24 months ago by Emily_Ensembl21k • written 24 months ago by banerjeeshayantan190
3
gravatar for finswimmer
24 months ago by
finswimmer14k
Germany
finswimmer14k wrote:

Hello,

I see no reason, why you need the strand information. If I'm correct these list of variants contains all information to create a valid bed file. Assuming that the format is <gene>_<chromosome>_<pos>_<REF>/<ALT> one can use awk:

$ awk -v FS="_" -v OFS="\t" '{print $2, $3-1, $3, $0}' variants.txt > output.bed

If you want to expand the region for e.g. 50bp to the left and right, do this:

$ awk -v FS="_" -v OFS="\t" '{print $2, $3-51, $3+50, $0}' variants.txt > output.bed

fin swimmer

ADD COMMENTlink written 24 months ago by finswimmer14k

Thanks a lot for your reply. How do I add"chr" to each chromosome number using awk? All my hg19 chromosome numbers are of the form "chr1" etc

ADD REPLYlink written 24 months ago by banerjeeshayantan190

Just add it to the print statement:

print "chr"$2, $3-1, $3, $0
ADD REPLYlink modified 24 months ago • written 24 months ago by finswimmer14k

Got it. Thanks a lot

ADD REPLYlink written 24 months ago by banerjeeshayantan190
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1654 users visited in the last hour
_