hello, i have a query related to upstream and downstream regions extraction. see i have done blastn for 30 nt small query sequences against 500 nt long db sequences now my question is after running blastn, how can i extract upstream and downstream regions for my 30 nt small quey sequences?
If you have the coordinate of the alignment e.g.
loc, then you can extract the sequence from the reference 500nt sequence starting from
loc-30 with length
30nt+2*30nt providing that you have no insertion or deletion in the alignment. If you have the indel, then just add the corresponding length.
To actually do the extraction, you can use the substr function from most programme, e.g. R, perl, awk.
no my problem is like this :
i have file1 is this
GL3482.1 GAACTTGAGATCCGGGGA GCAGTGGATCTCCACCAG CGGCCAGAACTGGTGCAC CTCCAGGCCAGCCTCGTC CTGCGTGTC
>GL3550.1 GGCATTTTTGTGTAATTTTTGGCTGGATGAGGT GACATTTTCATTACTACCATTTTGGAGTACA
>GL3472.1 TTTTCCTGTTCACTGCTGCTTTTCTATAGACAGCA GCAGCAAGCAGTAAGAGAAAGTA
file2 is this :
seq id start end GL3482.1 323100 323743 GL3550.1 41911 40888 GL3472.1 274408 272617
and i want result like this: