Using Taxonomizr on R with a list of input.
2.3 years ago

Hello all !

I am totally new to R and I'm trying to run a function from a specific package (taxonomizr) : getId.

It works like this :

data<-getId(c('Pestivirus A','Bos taurus','Homo'),taxaNames)


and as an output you get a table with one row with all the id associated to each species.

I have a lot of inputs (around 7000) that i concatenante in .txt file like this

'Pestivirus A','Bos taurus','Homo'...


I have tried to copy paste all the .txt file in the argument of the getId function. But when I run the command, nothing happens and I have symbol + instead of > in the console. Copy pasting only works for maximum about hundred of inputs (out of 7000).

Is there a way to use .txt file to avoid doing 70 copy pastes ?

I add it's the first time I am using R. So far I've imported my data like this and that's all

   species <- read.csv("C:/Users/pdoinel/Desktop/species.txt", header=FALSE)

specieds <- as.list(read.csv("list.txt", header=FALSE))
data<-getId(specieds, taxaNames)

Ok I get as a reult : Error in out[taxa] : type 'list' d'indice incorrect

have used

 specieds <- as.list(read.csv("C:/Users/pdoinel/Desktop/species.txt", header=FALSE))
data<-getId(specieds, taxaNames)

 df = read.csv("C:/Users/pdoinel/Desktop/species.txt", header=FALSE)
species <- as.vector(df[[1]])
data<-getId(species, taxaNames)

Ok I just did that but I have nothing as output.

My .txt file is :

'Nannochloropsis gaditana','Hondaea fermentalgiana','Nannochloropsis gaditana','Hondaea fermentalgiana','Hondaea fermentalgiana'


Print (df) is :

                            V1                       V2                         V3                       V4
V5
1 'Hondaea fermentalgiana'


Print (species) is :

"[1] "ï»¿'Nannochloropsis gaditana'"

You should show some lines of input .txt file at the very beginning. How may lines in it?

df = read.csv("t.txt", quote="'",  header=FALSE)
species = as.vector(t(df[1, ])) # 1 means the first row.


If there are more than one lines, use loop.

It is the only line. I tried the command with this small subset, noting more. Could the problem come from the ï»¿ ?

You may change the encoding of the text file as UTF8.

The problem does not only come from the enconding. I did df[1] <- NULL to delete this "bad variable". and it stills doesnt work. However, when I do copy paste this subset, it works. EDIT :actually it works for the first species of my.txt file (after deleting the encoding).

> print (species)
[1] "Hondaea fermentalgiana"

I have a lot of inputs (around 7000) that i concatenante in .txt file like this

Seems the one-line format is not the original format, one-name-per-line could be the easiest and most convenient for downstream processing.

And 7000 is not a small number, you can try taxonkit name2taxid for mapping scientific name to TaxIDs or further retrieve lineage, which supports windows but you need run in command line console.