Hi all, I am trying to retrive trnL sequence using the Entrez esearch utility. I used a code to do it for a long list of organisms for which I have the scientific name and for most of the organisms in the list it worked.
Since for many of them it doesn't work, I tried with the code for retriving only one organism, but still, it returns an empty fasta file. What could be the problem?
The code I used is the following (for one species only):
esearch -db nuccore -query "(trnl[gene]) AND (Abutilon theophrasti[orgn])" | efetch -format fasta >> output.fa
Any ideas? Thanks!
Thank you, I am not sure if I understood your answer, because I already checked on the website and I can find the trnL sequence for many of the plants that the command doesn't return..
As I said, there is no gene named trnL for this species in the databases. If you found something for this species you definitely used a different query.
Abutilon theophrasti[orgn] AND trnl
is possibly what you used, note how that is different from the original query. It gives results but these are not gene sequences in the strict sense. For example: https://www.ncbi.nlm.nih.gov/nuccore/HQ696727.1 is the first hit, but look at its annotation:Abutilon theophrasti tRNA-Leu (trnL) gene, partial sequence; trnL-trnF intergenic spacer, complete sequence; and tRNA-Phe (trnF) gene, partial sequence; chloroplast.
So this is a composed sequence consisting of part of trnl, the complete intergenic sequence between trnl and trnf and a part of the trnF gene from the chloroplast genome. This is not comparable to what you get from other searches. I don't know what you are aiming at but I would rather ignore these cases where you don't get anything because these non-gene sequences are sort of "broken" sequences. You can of course take a look through your scripts log output to check if an error occurred during some of the queries.Thank you very much, I understood now! :)