How to script for adding and subtracting values to look upstream?
1
0
Entering edit mode
9.0 years ago

I have a file output from a tblastn search, in outformat 6 (tabular/ tab delimited), with the columns: subject gi, evalue, subject start, subject end. e.g.

595625618    0.0       472083   473231
341932553    3e-128    53534    54640
152022606    4e-95     2695055  2693919
388532432    0.0       840617   841774
574094067    0.0       10789    11946

I would like to generate a new file with the start and end columns modified by a fixed value. I just need a simple script to do this.

However, depending on which strand the gene is on (+ or -) will determine whether a value is subtracted or added to the start column. i.e. I wish to "look upstream". Hence if start<stop, subtract 2000bp from start to give the new start. If start>stop, add 2000bp from the start value. The new stop value will always take the value of the old start.

The evalue needn't be preserved in the new file, but the gi should.

Any pointers in scripting would be appreciated.

blast perl python gene sequence • 1.6k views
ADD COMMENT
1
Entering edit mode
9.0 years ago
Ram 43k

Let's say column 1 is strand, column 2 is start and column 3 is end. awk will do it for you.

Check for strand:

$1=='+'
$1=='-'

Your conditions (as I interpreted them):

$2<$3 print { $2-2000,$2 }
$2>$3 print { $2+2000,$2 }

You can combine the above conditions as you see fit, and modify the operations that I may have gotten wrong.

ADD COMMENT
1
Entering edit mode

Thanks Ram, yes awk seems to be equipped for the job, I will give that a go.

ADD REPLY

Login before adding your answer.

Traffic: 2274 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6