I would like to extract sequence with specific coordinate. I did the blast search and now from this output I need to extract the 100bp upstream and downstream from the subject sequence to whom query sequence matched in blastn search. (which is subject start and subject end). I have gone through the similar problems and their solution and tried them. I tried the following approach from brentp enter link description here and I have the error.
python seq_ext_coordinate.py
Traceback (most recent call last):
File "seq_ext_coordinate.py", line 5, in <module>
EST_030516_lnr.fa = sys.argv[1]
IndexError: list index out of range
Does anybody have the idea what I did wrong?
But I would like to extract 100bp upstream and downstream of blast output. For example in the following example for first case I need to get the subject sequence "gi|34729142|" from 290+100 to 271-100 that is 171-390. While in the second case I need 371-100 to 391+100 that is 271-491.
In short I need such a program/script that look into the blast output result (sstart and s_end ). If sstart is smaller then s_end then substract 100 from sstart and add 100 in s_end as in second entry in the above example otherwise do other way round substract 100 from s_end and add 100 to s_start as in the first case in above example.
So First I need coordinate that can be supplied to bedtools to get the sequence
I updated the answer.
I tried according to the manual but I have following error:
bed file is as follow: (exmple)
EST file is as follow(example)
EST_030516_lnr.fa
is not the genome fasta. Its the file with chromosome name and length.