system() command in R does not return output in order
1
0
Entering edit mode
4.1 years ago
Frieda ▴ 60

I am trying to access the NCBI database from R, where I pass on a system command within R using system(..., intern = T) and return the result back into a variable. The command is a query which retrieves the taxonomy, given a taxID. It looks like the following:

batch <- system(paste0("esearch -db taxonomy -query \"",taxid.as.string, " [taxID]\" | 
                efetch -format xml | xtract -pattern Taxon -block \"*/Taxon\" -unless Rank 
                -equals \"no rank\" -tab \"\t\" -element Rank,ScientificName"), intern = 
                TRUE)

The variable above taxid.as.string is a one-element vector and looks like:

> taxid.as.string
[1] "7070, 5741, 658858"

The command searches in the NCBI database for the TaxIDs 7070, 5741, and 658858, to return the taxonomy for each.

My problem is that it does not return the taxonomy in the proper order.

Instead, it returns the result for 5741, 7070, and then 658858 TaxID. I know that I can keep the vector numeric, loop over it and make a single query at a time.

Why is this the case? Is it possible to keep the order of the result, even if some taxonomies are returned faster?

Thanks in advance!

r ncbi command-line • 1.3k views
ADD COMMENT
1
Entering edit mode

This has nothing to do with R. It is the way eutils works and unless they expose an option to preserve order (I highly doubt they would do that, given that it goes against efficiency), there is no way to change this except, as you say, querying one-by-one.

I think you should be able to query it all together and sort the result once you get it.

ADD REPLY
0
Entering edit mode

How can the results be sorted based on input's order if it only contains Rank and ScientificName!

ADD REPLY
1
Entering edit mode

If you don't have to do it in R, you can use bash similar to this post:

C: Using Entrez to find the taxonomy for an accession number

Cat reads a file sequentially, so you can make a file with each taxid in a separate row. The file shouldn't have any empty rows and no spaces after the taxids.

ADD REPLY
0
Entering edit mode

I do not want to run this query line by line for every single taxid as it takes a lot of time, that is why I was making queries in batches. Do you how I can include the taxids in the results?

By adding TaxId to the -element, I will get the taxid for every single taxonomy division, but I only want the TaxId which in this case is taxid.as.string.

batch <- system(paste0("esearch -db taxonomy -query \"",taxid.as.string, " [taxID]\" | efetch -format xml | xtract -pattern Taxon -block \"*/Taxon\" -unless Rank -equals \"no rank\" -tab \"\t\" -element TaxId,Rank,ScientificName"), intern = TRUE)
ADD REPLY
1
Entering edit mode
$ esearch -db taxonomy -query ",7070, 5741, 658858, [taxID] " | efetch -format xml | xtract -pattern Taxon -element TaxId,Rank,ScientificName
658858  no rank Giardia lamblia P15
7070    species Tribolium castaneum
5741    species Giardia intestinalis

Can't you use grep or sed to filter the output?

By adding TaxId to the -element, I will get the taxid for every single taxonomy division, but I only want the TaxId which in this case is taxid.as.string.

Could you give me a sample taxid for this?

ADD REPLY
0
Entering edit mode

Your command does not preserve taxid order, and bash code to preserve taxid order will be cumbersome.

ADD REPLY
0
Entering edit mode

I didn't claim that my command preserves the order :) I'm just trying to understand the question better and asked OP to provide me with an example of where the command doesn't work. Also the response in other thread is actually a bash code ;)

ADD REPLY
0
Entering edit mode
4.1 years ago
GenoMax 141k

This question has been answered by @vkkodali in a different thread: C: how to find the kingdom for a taxon id using the terminal?

ADD COMMENT
0
Entering edit mode

There is still the matter of the output order not matching the input order. If that was not important to OP, this question itself has no meaning.

ADD REPLY
1
Entering edit mode

The reason the OP was concerned about order was because the output didn't contained the taxid and it would make it difficult to map the input to output, but if the output contains taxid it is going to be fine.

Do you how I can include the taxids in the results? By adding TaxId to the -element, I will get the taxid for every single taxonomy division, but I only want the TaxId which in this case is taxid.as.string.

ADD REPLY
0
Entering edit mode

Thank you for clarifying that. I was wondering why OP insisted on maintaining order and your explanation makes it clear.

ADD REPLY

Login before adding your answer.

Traffic: 2744 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6