Question

Using Taxonomizr on R with a list of input.

0

Entering edit mode

3.5 years ago

doinelpierrot ▴ 50

Hello all !

I am totally new to R and I'm trying to run a function from a specific package (taxonomizr) : getId.

It works like this :

data<-getId(c('Pestivirus A','Bos taurus','Homo'),taxaNames)

and as an output you get a table with one row with all the id associated to each species.

I have a lot of inputs (around 7000) that i concatenante in .txt file like this

'Pestivirus A','Bos taurus','Homo'...

I have tried to copy paste all the .txt file in the argument of the getId function. But when I run the command, nothing happens and I have symbol + instead of > in the console. Copy pasting only works for maximum about hundred of inputs (out of 7000).

Is there a way to use .txt file to avoid doing 70 copy pastes ?

I add it's the first time I am using R. So far I've imported my data like this and that's all

   species <- read.csv("C:/Users/pdoinel/Desktop/species.txt", header=FALSE)

R software error • 1.2k views

ADD COMMENT • link 3.5 years ago by doinelpierrot ▴ 50

0

Entering edit mode

specieds <- as.list(read.csv("list.txt", header=FALSE))
data<-getId(specieds, taxaNames)

ADD REPLY • link 3.5 years ago by shenwei356 8.4k

0

Entering edit mode

Ok I get as a reult : Error in out[taxa] : type 'list' d'indice incorrect

have used

 specieds <- as.list(read.csv("C:/Users/pdoinel/Desktop/species.txt", header=FALSE))
 data<-getId(specieds, taxaNames)

ADD REPLY • link updated 3.5 years ago by shenwei356 8.4k • written 3.5 years ago by doinelpierrot ▴ 50

0

Entering edit mode

 df = read.csv("C:/Users/pdoinel/Desktop/species.txt", header=FALSE)
 species <- as.vector(df[[1]])
 data<-getId(species, taxaNames)

ADD REPLY • link 3.5 years ago by shenwei356 8.4k

0

Entering edit mode

Ok I just did that but I have nothing as output.

My .txt file is :

'Nannochloropsis gaditana','Hondaea fermentalgiana','Nannochloropsis gaditana','Hondaea fermentalgiana','Hondaea fermentalgiana'

Print (df) is :

                            V1                       V2                         V3                       V4
1 ï»¿'Nannochloropsis gaditana' 'Hondaea fermentalgiana' 'Nannochloropsis gaditana' 'Hondaea fermentalgiana'
                        V5
1 'Hondaea fermentalgiana'

Print (species) is :

"[1] "ï»¿'Nannochloropsis gaditana'"

ADD REPLY • link updated 3.5 years ago by shenwei356 8.4k • written 3.5 years ago by doinelpierrot ▴ 50

0

Entering edit mode

You should show some lines of input .txt file at the very beginning. How may lines in it?

df = read.csv("t.txt", quote="'",  header=FALSE)
species = as.vector(t(df[1, ])) # 1 means the first row.

If there are more than one lines, use loop.

ADD REPLY • link 3.5 years ago by shenwei356 8.4k

0

Entering edit mode

It is the only line. I tried the command with this small subset, noting more. Could the problem come from the ï»¿ ?

ADD REPLY • link 3.5 years ago by doinelpierrot ▴ 50

0

Entering edit mode

You may change the encoding of the text file as UTF8.

ADD REPLY • link 3.5 years ago by shenwei356 8.4k

0

Entering edit mode

The problem does not only come from the enconding. I did df[1] <- NULL to delete this "bad variable". and it stills doesnt work. However, when I do copy paste this subset, it works. EDIT :actually it works for the first species of my.txt file (after deleting the encoding).

> print (species)
[1] "Hondaea fermentalgiana"

ADD REPLY • link 3.5 years ago by doinelpierrot ▴ 50

0

Entering edit mode

I have a lot of inputs (around 7000) that i concatenante in .txt file like this

Seems the one-line format is not the original format, one-name-per-line could be the easiest and most convenient for downstream processing.

And 7000 is not a small number, you can try taxonkit name2taxid for mapping scientific name to TaxIDs or further retrieve lineage, which supports windows but you need run in command line console.

ADD REPLY • link 3.5 years ago by shenwei356 8.4k