Question: Using Taxonomizr on R with a list of input.
0
gravatar for doinelpierrot
4 weeks ago by
doinelpierrot0 wrote:

Hello all !

I am totally new to R and I'm trying to run a function from a specific package (taxonomizr) : getId.

It works like this :

data<-getId(c('Pestivirus A','Bos taurus','Homo'),taxaNames)

and as an output you get a table with one row with all the id associated to each species.

I have a lot of inputs (around 7000) that i concatenante in .txt file like this

'Pestivirus A','Bos taurus','Homo'...

I have tried to copy paste all the .txt file in the argument of the getId function. But when I run the command, nothing happens and I have symbol + instead of > in the console. Copy pasting only works for maximum about hundred of inputs (out of 7000).

Is there a way to use .txt file to avoid doing 70 copy pastes ?

I add it's the first time I am using R. So far I've imported my data like this and that's all

   species <- read.csv("C:/Users/pdoinel/Desktop/species.txt", header=FALSE)
R software error • 108 views
ADD COMMENTlink modified 4 weeks ago • written 4 weeks ago by doinelpierrot0
specieds <- as.list(read.csv("list.txt", header=FALSE))
data<-getId(specieds, taxaNames)
ADD REPLYlink written 4 weeks ago by shenwei3565.6k

Ok I get as a reult : Error in out[taxa] : type 'list' d'indice incorrect

have used

 specieds <- as.list(read.csv("C:/Users/pdoinel/Desktop/species.txt", header=FALSE))
 data<-getId(specieds, taxaNames)
ADD REPLYlink modified 4 weeks ago by shenwei3565.6k • written 4 weeks ago by doinelpierrot0
 df = read.csv("C:/Users/pdoinel/Desktop/species.txt", header=FALSE)
 species <- as.vector(df[[1]])
 data<-getId(species, taxaNames)
ADD REPLYlink written 4 weeks ago by shenwei3565.6k

Ok I just did that but I have nothing as output.

My .txt file is :

'Nannochloropsis gaditana','Hondaea fermentalgiana','Nannochloropsis gaditana','Hondaea fermentalgiana','Hondaea fermentalgiana'

Print (df) is :

                            V1                       V2                         V3                       V4
1 'Nannochloropsis gaditana' 'Hondaea fermentalgiana' 'Nannochloropsis gaditana' 'Hondaea fermentalgiana'
                        V5
1 'Hondaea fermentalgiana'

Print (species) is :

"[1] "'Nannochloropsis gaditana'"
ADD REPLYlink modified 4 weeks ago by shenwei3565.6k • written 4 weeks ago by doinelpierrot0

You should show some lines of input .txt file at the very beginning. How may lines in it?

df = read.csv("t.txt", quote="'",  header=FALSE)
species = as.vector(t(df[1, ])) # 1 means the first row.

If there are more than one lines, use loop.

ADD REPLYlink written 4 weeks ago by shenwei3565.6k

It is the only line. I tried the command with this small subset, noting more. Could the problem come from the  ?

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by doinelpierrot0

You may change the encoding of the text file as UTF8.

ADD REPLYlink written 4 weeks ago by shenwei3565.6k

The problem does not only come from the enconding. I did df[1] <- NULL to delete this "bad variable". and it stills doesnt work. However, when I do copy paste this subset, it works. EDIT :actually it works for the first species of my.txt file (after deleting the encoding).

> print (species)
[1] "Hondaea fermentalgiana"
ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by doinelpierrot0

I have a lot of inputs (around 7000) that i concatenante in .txt file like this

Seems the one-line format is not the original format, one-name-per-line could be the easiest and most convenient for downstream processing.

And 7000 is not a small number, you can try taxonkit name2taxid for mapping scientific name to TaxIDs or further retrieve lineage, which supports windows but you need run in command line console.

ADD REPLYlink written 4 weeks ago by shenwei3565.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1963 users visited in the last hour