Question: system() command in R does not return output in order
0
gravatar for Freddy
4 months ago by
Freddy20
Freddy20 wrote:

I am trying to access the NCBI database from R, where I pass on a system command within R using system(..., intern = T) and return the result back into a variable. The command is a query which retrieves the taxonomy, given a taxID. It looks like the following:

batch <- system(paste0("esearch -db taxonomy -query \"",taxid.as.string, " [taxID]\" | 
                efetch -format xml | xtract -pattern Taxon -block \"*/Taxon\" -unless Rank 
                -equals \"no rank\" -tab \"\t\" -element Rank,ScientificName"), intern = 
                TRUE)

The variable above taxid.as.string is a one-element vector and looks like:

> taxid.as.string
[1] "7070, 5741, 658858"

The command searches in the NCBI database for the TaxIDs 7070, 5741, and 658858, to return the taxonomy for each.

My problem is that it does not return the taxonomy in the proper order.

Instead, it returns the result for 5741, 7070, and then 658858 TaxID. I know that I can keep the vector numeric, loop over it and make a single query at a time.

Why is this the case? Is it possible to keep the order of the result, even if some taxonomies are returned faster?

Thanks in advance!

command-line R ncbi • 193 views
ADD COMMENTlink written 4 months ago by Freddy20
1

This has nothing to do with R. It is the way eutils works and unless they expose an option to preserve order (I highly doubt they would do that, given that it goes against efficiency), there is no way to change this except, as you say, querying one-by-one.

I think you should be able to query it all together and sort the result once you get it.

ADD REPLYlink written 4 months ago by RamRS28k

How can the results be sorted based on input's order if it only contains Rank and ScientificName!

ADD REPLYlink modified 4 months ago • written 4 months ago by Fatima590
1

If you don't have to do it in R, you can use bash similar to this post:

C: Using Entrez to find the taxonomy for an accession number

Cat reads a file sequentially, so you can make a file with each taxid in a separate row. The file shouldn't have any empty rows and no spaces after the taxids.

ADD REPLYlink written 4 months ago by Fatima590

I do not want to run this query line by line for every single taxid as it takes a lot of time, that is why I was making queries in batches. Do you how I can include the taxids in the results?

By adding TaxId to the -element, I will get the taxid for every single taxonomy division, but I only want the TaxId which in this case is taxid.as.string.

batch <- system(paste0("esearch -db taxonomy -query \"",taxid.as.string, " [taxID]\" | efetch -format xml | xtract -pattern Taxon -block \"*/Taxon\" -unless Rank -equals \"no rank\" -tab \"\t\" -element TaxId,Rank,ScientificName"), intern = TRUE)
ADD REPLYlink written 4 months ago by Freddy20
1
$ esearch -db taxonomy -query ",7070, 5741, 658858, [taxID] " | efetch -format xml | xtract -pattern Taxon -element TaxId,Rank,ScientificName
658858  no rank Giardia lamblia P15
7070    species Tribolium castaneum
5741    species Giardia intestinalis

Can't you use grep or sed to filter the output?

By adding TaxId to the -element, I will get the taxid for every single taxonomy division, but I only want the TaxId which in this case is taxid.as.string.

Could you give me a sample taxid for this?

ADD REPLYlink modified 4 months ago • written 4 months ago by Fatima590

Your command does not preserve taxid order, and bash code to preserve taxid order will be cumbersome.

ADD REPLYlink written 4 months ago by RamRS28k

I didn't claim that my command preserves the order :) I'm just trying to understand the question better and asked OP to provide me with an example of where the command doesn't work. Also the response in other thread is actually a bash code ;)

ADD REPLYlink written 4 months ago by Fatima590
0
gravatar for genomax
4 months ago by
genomax87k
United States
genomax87k wrote:

This question has been answered by @vkkodali in a different thread: C: how to find the kingdom for a taxon id using the terminal?

ADD COMMENTlink written 4 months ago by genomax87k

There is still the matter of the output order not matching the input order. If that was not important to OP, this question itself has no meaning.

ADD REPLYlink written 4 months ago by RamRS28k
1

The reason the OP was concerned about order was because the output didn't contained the taxid and it would make it difficult to map the input to output, but if the output contains taxid it is going to be fine.

Do you how I can include the taxids in the results? By adding TaxId to the -element, I will get the taxid for every single taxonomy division, but I only want the TaxId which in this case is taxid.as.string.

ADD REPLYlink written 4 months ago by Fatima590

Thank you for clarifying that. I was wondering why OP insisted on maintaining order and your explanation makes it clear.

ADD REPLYlink written 4 months ago by RamRS28k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1634 users visited in the last hour