Best way for finding annotated counterparts of unknown transcript after BLAST
0
0
Entering edit mode
3 months ago
Mathias • 0

Hi all!

I am building a local pipeline in order to identify unknown transcripts. One part of this pipeline is identifying if the unknown sequences have a similar already-annotated counterpart. For this, I locally BLAST the transcripts and I am able to get the accession code, the coordinates, and strand of the hit in the other genome. With this, I expected to extract possible annotations found within the genome of the hit. I tried using efetch with the following call and delivers the next output:

 efetch -db nuccore -id "CP040608.1" -seq_start 17402 -seq_stop 16692 -strand 1 -format ft

>Feature gb|CP040608.1|
<1      647     gene
locus_tag       FBF02_00060
<1      647     CDS
product desulfoferrodoxin FeS4 iron-binding domain-containing protein
transl_table    11
protein_id      gb|QJE54075.1||gnl|PRJNA258022|FBF02_00060
inference       COORDINATES: similar to AA sequence:RefSeq:YP_002343484.1


Sadly I expect the region to be labelled only in the plus strand, but changing the strand to 2 delivers the same result...

Do you have any suggestion why this is happening? Do you have maybe another solution rather than efetch? I would expect to run ~10.000 of hits and efetch is quite slow and restrictive for this large amount of queries.

blast efetch pipeline annotation • 140 views