Cannot retrive gene sequences with Entrez esearch
1
0
Entering edit mode
2.1 years ago
LaFra ▴ 10

Hi all, I am trying to retrive trnL sequence using the Entrez esearch utility. I used a code to do it for a long list of organisms for which I have the scientific name and for most of the organisms in the list it worked.

Since for many of them it doesn't work, I tried with the code for retriving only one organism, but still, it returns an empty fasta file. What could be the problem?

The code I used is the following (for one species only):

esearch -db nuccore -query "(trnl[gene]) AND (Abutilon theophrasti[orgn])" | efetch -format fasta >> output.fa

Any ideas? Thanks!

Entrez • 696 views
ADD COMMENT
0
Entering edit mode
2.1 years ago
Michael 54k

The likely answer is that there is no result for this query. Remember that you can always test your query on the NCBI web-site directly. There, your query yields 0 results. You may find the reason for that by searching "Abutilon theophrasti[orgn] AND trnl". I interpret the results such that the sequence is only partial or unreliable and therefore not annotated with the gene name. You'd have to manually inspect all entries that do not yield a result if you want to substitute a partial or unreliable sequence. Most likely it is not worth the effort though.

ADD COMMENT
0
Entering edit mode

Thank you, I am not sure if I understood your answer, because I already checked on the website and I can find the trnL sequence for many of the plants that the command doesn't return..

ADD REPLY
0
Entering edit mode

As I said, there is no gene named trnL for this species in the databases. If you found something for this species you definitely used a different query. Abutilon theophrasti[orgn] AND trnl is possibly what you used, note how that is different from the original query. It gives results but these are not gene sequences in the strict sense. For example: https://www.ncbi.nlm.nih.gov/nuccore/HQ696727.1 is the first hit, but look at its annotation: Abutilon theophrasti tRNA-Leu (trnL) gene, partial sequence; trnL-trnF intergenic spacer, complete sequence; and tRNA-Phe (trnF) gene, partial sequence; chloroplast. So this is a composed sequence consisting of part of trnl, the complete intergenic sequence between trnl and trnf and a part of the trnF gene from the chloroplast genome. This is not comparable to what you get from other searches. I don't know what you are aiming at but I would rather ignore these cases where you don't get anything because these non-gene sequences are sort of "broken" sequences. You can of course take a look through your scripts log output to check if an error occurred during some of the queries.

ADD REPLY
0
Entering edit mode

Thank you very much, I understood now! :)

ADD REPLY

Login before adding your answer.

Traffic: 2127 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6