Best way for finding annotated counterparts of unknown transcript after BLAST
Entering edit mode
3 months ago
Mathias • 0

Hi all!

I am building a local pipeline in order to identify unknown transcripts. One part of this pipeline is identifying if the unknown sequences have a similar already-annotated counterpart. For this, I locally BLAST the transcripts and I am able to get the accession code, the coordinates, and strand of the hit in the other genome. With this, I expected to extract possible annotations found within the genome of the hit. I tried using efetch with the following call and delivers the next output:

 efetch -db nuccore -id "CP040608.1" -seq_start 17402 -seq_stop 16692 -strand 1 -format ft

>Feature gb|CP040608.1|
<1      647     gene
                        locus_tag       FBF02_00060
<1      647     CDS
                        product desulfoferrodoxin FeS4 iron-binding domain-containing protein
                        transl_table    11
                        protein_id      gb|QJE54075.1||gnl|PRJNA258022|FBF02_00060
                        inference       COORDINATES: similar to AA sequence:RefSeq:YP_002343484.1

Sadly I expect the region to be labelled only in the plus strand, but changing the strand to 2 delivers the same result...

Do you have any suggestion why this is happening? Do you have maybe another solution rather than efetch? I would expect to run ~10.000 of hits and efetch is quite slow and restrictive for this large amount of queries.

Thanks in advance!

blast efetch pipeline annotation • 140 views

Login before adding your answer.

Traffic: 2456 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6