Question

Salmon to tximport error

1

Entering edit mode

4.1 years ago

evelyn ▴ 230

Hello,

I have salmon output as a directory with multiple files including quant.sf. I am trying to import it to R using tximport:

setwd("/path/quant/")
dir <- dir(paste0(path.package("tximportData"), "/path/quant/"), full.names = TRUE)
list.files()
samples <- read.csv("path/Table.csv", sep=";", header = TRUE)
samples
files <- file.path(dir, "salmon", samples$Run, "quant.sf")
names(files) <- paste0("samples", 1:100)

But it gives an error:

Error in names(files) <- paste0("samples") : 
  'names' attribute [100] must be the same length as the vector [0]

However, the individual files, Table.csv and quant.sf both have 100 observations information. Thank you!

sequencing R Bioconductor • 3.9k views

ADD COMMENT • link updated 4.1 years ago by zx8754 11k • written 4.1 years ago by evelyn ▴ 230

1

Entering edit mode

You've got some weird things goin on in this snippet:

You should avoid using setwd, does /path/quant even exists?
you're overriding dir function with the content of its result
Your csv file is ; separated?
As Kevin mentioned your files vector is empty, pay attention that you print the output of list.files() but never use this output, make sure you don't assume it's like dir

ADD REPLY • link 4.1 years ago by Asaf 10k

0

Entering edit mode

Thank you! 1. I took out setwd. 2. I changed the output file name. 3. Yes. 4. files vector actually shows the file and run names. But I get:

all(file.exists(files))
[1] FALSE

ADD REPLY • link 4.1 years ago by evelyn ▴ 230

0

Entering edit mode

For Salmon, you basically just need the following components (I have taken this from one of my previous datasets):

files
[1] "out/salmon//Patient_46/quant.sf" "out/salmon//Patient_47/quant.sf"
[3] "out/salmon//Patient_54/quant.sf" "out/salmon//Patient_60/quant.sf"
[5] "out/salmon//Patient_62/quant.sf" "out/salmon//Patient_6/quant.sf" 
[7] "out/salmon//Patient_7/quant.sf" 

metadata
               sample Condition
Patient_46 Patient_46   Control
Patient_47 Patient_47   Patient
Patient_54 Patient_54   Control
Patient_60 Patient_60   Control
Patient_62 Patient_62   Control
Patient_6   Patient_6   Patient
Patient_7   Patient_7   Patient

head(TranscriptToGeneMap)
                                                                                                                                         V1
1                            ENST00000456328.2|ENSG00000223972.5|OTTHUMG00000000961.2|OTTHUMT00000362751.1|DDX11L1-202|DDX11L1|1657|lncRNA|
2 ENST00000450305.2|ENSG00000223972.5|OTTHUMG00000000961.2|OTTHUMT00000002844.2|DDX11L1-201|DDX11L1|632|transcribed_unprocessed_pseudogene|
3              ENST00000488147.1|ENSG00000227232.5|OTTHUMG00000000958.1|OTTHUMT00000002839.1|WASH7P-201|WASH7P|1351|unprocessed_pseudogene|
4                                                                 ENST00000619216.1|ENSG00000278267.1|-|-|MIR6859-1-201|MIR6859-1|68|miRNA|
5                     ENST00000473358.1|ENSG00000243485.5|OTTHUMG00000000959.2|OTTHUMT00000002840.1|MIR1302-2HG-202|MIR1302-2HG|712|lncRNA|
6                     ENST00000469289.1|ENSG00000243485.5|OTTHUMG00000000959.2|OTTHUMT00000002841.2|MIR1302-2HG-201|MIR1302-2HG|535|lncRNA|
                                          V2
1                             DDX11L1_lncRNA
2 DDX11L1_transcribed_unprocessed_pseudogene
3              WASH7P_unprocessed_pseudogene
4                            MIR6859-1_miRNA
5                         MIR1302-2HG_lncRNA
6                         MIR1302-2HG_lncRNA

txi <- tximport(
  as.character(files),
  type = "salmon",
  txIn = TRUE, txOut = FALSE,
  tx2gene = TranscriptToGeneMap,
  countsFromAbundance = "no")

ADD REPLY • link 4.1 years ago by Kevin Blighe 87k

0

Entering edit mode

Thank you! I have this:

files
samples1 
"path/salmon/SRR59/quant.sf" 
samples2 
"path/salmon/SRR60/quant.sf" 
samples3 
"path/salmon/SRR61/quant.sf" 
 samples4 
"path/salmon/SRR62/quant.sf" 
samples5 
"path/salmon/SRR63/quant.sf" 

   > samples
           BioSample Experiment        Run SRA_Sample
    1   69 SRX272 SRR59 SRS602
    2   70 SRX271 SRR60 SRS601
    3   67 SRX270 SRR61 SRS598
    4   68 SRX269 SRR62 SRS599
    5   35 SRX268 SRR63 SRS597

But I am still getting:

all(file.exists(files))
[1] FALSE

I am not what is going wrong.

ADD REPLY • link 4.1 years ago by evelyn ▴ 230

0

Entering edit mode

Do these files exist under getwd()?

ADD REPLY • link 4.1 years ago by Asaf 10k

0

Entering edit mode

getwd() shows the dir name where quant.sf is located.

ADD REPLY • link 4.1 years ago by evelyn ▴ 230

1

Entering edit mode

All that you need to use, from any directory, is list.files(). Once you have list.files() listing the relative paths of your quant.sf files, then everything else should be easy.

Take a look at this example, where I start at my command prompt, and then move to R:

BASH

pwd
/home/kblighe/Escritorio/DGERObj

ls -l packages/
total 26456
-rw-rw-r-- 1 kblighe kblighe   484419 May 31  2019 DGEobj_0.9.38.tar.gz
-rw-rw-r-- 1 kblighe kblighe   484224 Jun  5  2019 DGEobj_0.9.39.tar.gz
-rw-rw-r-- 1 kblighe kblighe   264419 Jun 11  2019 DGEobj_0.9.42.tar.gz
-rw-rw-r-- 1 kblighe kblighe 12870772 May 31  2019 DGE.Tools2_0.9.76.tar.gz
-rw-rw-r-- 1 kblighe kblighe 12870795 Jun 11  2019 DGE.Tools2_0.9.78.tar.gz
-rw-rw-r-- 1 kblighe kblighe    53180 May 31  2019 JRTutil_0.9.38.tar.gz

R

getwd()
[1] "/home/kblighe/Escritorio/DGERObj"

list.files('packages/', full = TRUE)
[1] "packages//DGEobj_0.9.38.tar.gz"     "packages//DGEobj_0.9.39.tar.gz"    
[3] "packages//DGEobj_0.9.42.tar.gz"     "packages//DGE.Tools2_0.9.76.tar.gz"
[5] "packages//DGE.Tools2_0.9.78.tar.gz" "packages//JRTutil_0.9.38.tar.gz"   

files <- list.files('packages/', full = TRUE)

all(file.exists(files))
[1] TRUE

ADD REPLY • link 4.1 years ago by Kevin Blighe 87k

0

Entering edit mode

Okay, so in the first step after setting the path to directory containing quant.sf, list.files() just gives the names of the folders in that directories with one folder having quant.sf file.

ADD REPLY • link 4.1 years ago by evelyn ▴ 230

0

Entering edit mode

Do you know how Salmon output looks like? Each folder is a library with a quant.sf file. You should list the files in the head directory (that contains all the Salmon results) and work from there. Make sure you don't mix dirs. Take your time, you'll figure it out eventually.

ADD REPLY • link 4.1 years ago by Asaf 10k

0

Entering edit mode

Okay. I ran salmon for multiple files together and have one quant.sf. Do I need to run it individually for each paired end fastq file.

ADD REPLY • link 4.1 years ago by evelyn ▴ 230

1

Entering edit mode

No, you should run it separately for each sample, it may have a few files if it was sequenced in different lanes.

ADD REPLY • link 4.1 years ago by Asaf 10k

1

Entering edit mode

Side note: use read.csv2 when we have ";" as a separator.

ADD REPLY • link 4.1 years ago by zx8754 11k

0

Entering edit mode

The error implies that the length of files is not equal to 100. So, please check the contents of the files variable.

ADD REPLY • link 4.1 years ago by Kevin Blighe 87k

score 3 · Answer 1 · 2020-03-17

3

Entering edit mode

4.1 years ago

ATpoint 81k

I always do it like this:

1) Put all the directories that salmon produced (the ones that contain the quant.sf files) into a directory. I personally always call this salmons.

2) Once you moved all these directories into the salmons directory you enter this directory from within R:

setwd("path/to/salmons/")

3) You list the quant.sf files, give each file a name and then run tximport:

my.files <-  list.files(list.dirs(path = ".", full.names = TRUE, recursive = FALSE), pattern = "quant.sf", full.names = TRUE)

> [1] "./B_rep1_salmon/quant.sf"   "./B_rep2_salmon/quant.sf"

## You give the samples a name, I always use the directory name:
names(my.files) <- sapply(strsplit(my.files, split="\\/"), function(x)x[2])

> head(my.files, 6)
B_rep1_salmon                B_rep2_salmon             
"./B_rep1_salmon/quant.sf"   "./B_rep2_salmon/quant.sf"

## Now run tximport. Done.
txi <- tximport(files = my.files, type = "salmon", tx2gene = tx2gene)

ADD COMMENT • link 4.1 years ago by ATpoint 81k

0

Entering edit mode

Thank you! As I said above I think there might be a problem as I ran salmon for multiple files together and have only one quant.sf. Now I am trying to run an array to produce different quant.sf files for each pair of files.

ADD REPLY • link 4.1 years ago by evelyn ▴ 230

0

Entering edit mode

I finally got

>all(file.exists(my.files))
TRUE

Now I am trying to create tx2gene using:
TxDb <- makeTxDbFromGFF(file = "/path/gene_models.gff")
k <- keys(TxDb, keytype = "TXNAME")
tx2gene <- select(TxDb, k, "GENEID", "TXNAME")
head(tx2gene)

But I got error:

> tx2gene <- select(TxDb, k, "GENEID", "TXNAME")
Error in UseMethod("select_") : 
  no applicable method for 'select_' applied to an object of class "c('TxDb', 'AnnotationDb', 'envRefClass', '.environment', 'refClass', 'environment', 'refObject', 'AssayData')"

ADD REPLY • link 4.1 years ago by evelyn ▴ 230

2

Entering edit mode

select was probably run-over by tidyverse. Try using AnnotationDbi::select directly

ADD REPLY • link 4.1 years ago by Asaf 10k

0

Entering edit mode

Thank you! I did it and got:

'select()' returned 1:1 mapping between keys and columns

Is it normal?

ADD REPLY • link 4.1 years ago by evelyn ▴ 230

0

Entering edit mode

To get last directory name:

my.files <- c("./B_rep1_salmon/quant.sf", "./B_rep2_salmon/quant.sf")
basename(dirname(my.files))
# [1] "B_rep1_salmon" "B_rep2_salmon"

ADD REPLY • link 4.1 years ago by zx8754 11k