Question: Salmon to tximport error
1
gravatar for evelyn
7 months ago by
evelyn110
evelyn110 wrote:

Hello,

I have salmon output as a directory with multiple files including quant.sf. I am trying to import it to R using tximport:

setwd("/path/quant/")
dir <- dir(paste0(path.package("tximportData"), "/path/quant/"), full.names = TRUE)
list.files()
samples <- read.csv("path/Table.csv", sep=";", header = TRUE)
samples
files <- file.path(dir, "salmon", samples$Run, "quant.sf")
names(files) <- paste0("samples", 1:100)

But it gives an error:

Error in names(files) <- paste0("samples") : 
  'names' attribute [100] must be the same length as the vector [0]

However, the individual files, Table.csv and quant.sf both have 100 observations information. Thank you!

sequencing bioconductor R • 616 views
ADD COMMENTlink modified 7 months ago by zx87549.7k • written 7 months ago by evelyn110
1

You've got some weird things goin on in this snippet:

  1. You should avoid using setwd, does /path/quant even exists?
  2. you're overriding dir function with the content of its result
  3. Your csv file is ; separated?
  4. As Kevin mentioned your files vector is empty, pay attention that you print the output of list.files() but never use this output, make sure you don't assume it's like dir
ADD REPLYlink written 7 months ago by Asaf8.4k

Thank you! 1. I took out setwd. 2. I changed the output file name. 3. Yes. 4. files vector actually shows the file and run names. But I get:

all(file.exists(files))
[1] FALSE
ADD REPLYlink modified 7 months ago • written 7 months ago by evelyn110

For Salmon, you basically just need the following components (I have taken this from one of my previous datasets):

files
[1] "out/salmon//Patient_46/quant.sf" "out/salmon//Patient_47/quant.sf"
[3] "out/salmon//Patient_54/quant.sf" "out/salmon//Patient_60/quant.sf"
[5] "out/salmon//Patient_62/quant.sf" "out/salmon//Patient_6/quant.sf" 
[7] "out/salmon//Patient_7/quant.sf" 

metadata
               sample Condition
Patient_46 Patient_46   Control
Patient_47 Patient_47   Patient
Patient_54 Patient_54   Control
Patient_60 Patient_60   Control
Patient_62 Patient_62   Control
Patient_6   Patient_6   Patient
Patient_7   Patient_7   Patient

head(TranscriptToGeneMap)
                                                                                                                                         V1
1                            ENST00000456328.2|ENSG00000223972.5|OTTHUMG00000000961.2|OTTHUMT00000362751.1|DDX11L1-202|DDX11L1|1657|lncRNA|
2 ENST00000450305.2|ENSG00000223972.5|OTTHUMG00000000961.2|OTTHUMT00000002844.2|DDX11L1-201|DDX11L1|632|transcribed_unprocessed_pseudogene|
3              ENST00000488147.1|ENSG00000227232.5|OTTHUMG00000000958.1|OTTHUMT00000002839.1|WASH7P-201|WASH7P|1351|unprocessed_pseudogene|
4                                                                 ENST00000619216.1|ENSG00000278267.1|-|-|MIR6859-1-201|MIR6859-1|68|miRNA|
5                     ENST00000473358.1|ENSG00000243485.5|OTTHUMG00000000959.2|OTTHUMT00000002840.1|MIR1302-2HG-202|MIR1302-2HG|712|lncRNA|
6                     ENST00000469289.1|ENSG00000243485.5|OTTHUMG00000000959.2|OTTHUMT00000002841.2|MIR1302-2HG-201|MIR1302-2HG|535|lncRNA|
                                          V2
1                             DDX11L1_lncRNA
2 DDX11L1_transcribed_unprocessed_pseudogene
3              WASH7P_unprocessed_pseudogene
4                            MIR6859-1_miRNA
5                         MIR1302-2HG_lncRNA
6                         MIR1302-2HG_lncRNA

txi <- tximport(
  as.character(files),
  type = "salmon",
  txIn = TRUE, txOut = FALSE,
  tx2gene = TranscriptToGeneMap,
  countsFromAbundance = "no")
ADD REPLYlink written 7 months ago by Kevin Blighe66k

Thank you! I have this:

files
samples1 
"path/salmon/SRR59/quant.sf" 
samples2 
"path/salmon/SRR60/quant.sf" 
samples3 
"path/salmon/SRR61/quant.sf" 
 samples4 
"path/salmon/SRR62/quant.sf" 
samples5 
"path/salmon/SRR63/quant.sf" 

   > samples
           BioSample Experiment        Run SRA_Sample
    1   69 SRX272 SRR59 SRS602
    2   70 SRX271 SRR60 SRS601
    3   67 SRX270 SRR61 SRS598
    4   68 SRX269 SRR62 SRS599
    5   35 SRX268 SRR63 SRS597

But I am still getting:

all(file.exists(files))
[1] FALSE

I am not what is going wrong.

ADD REPLYlink written 7 months ago by evelyn110

Do these files exist under getwd()?

ADD REPLYlink written 7 months ago by Asaf8.4k

getwd() shows the dir name where quant.sf is located.

ADD REPLYlink written 7 months ago by evelyn110
1

All that you need to use, from any directory, is list.files(). Once you have list.files() listing the relative paths of your quant.sf files, then everything else should be easy.

Take a look at this example, where I start at my command prompt, and then move to R:

BASH

pwd
/home/kblighe/Escritorio/DGERObj

ls -l packages/
total 26456
-rw-rw-r-- 1 kblighe kblighe   484419 May 31  2019 DGEobj_0.9.38.tar.gz
-rw-rw-r-- 1 kblighe kblighe   484224 Jun  5  2019 DGEobj_0.9.39.tar.gz
-rw-rw-r-- 1 kblighe kblighe   264419 Jun 11  2019 DGEobj_0.9.42.tar.gz
-rw-rw-r-- 1 kblighe kblighe 12870772 May 31  2019 DGE.Tools2_0.9.76.tar.gz
-rw-rw-r-- 1 kblighe kblighe 12870795 Jun 11  2019 DGE.Tools2_0.9.78.tar.gz
-rw-rw-r-- 1 kblighe kblighe    53180 May 31  2019 JRTutil_0.9.38.tar.gz

R

getwd()
[1] "/home/kblighe/Escritorio/DGERObj"

list.files('packages/', full = TRUE)
[1] "packages//DGEobj_0.9.38.tar.gz"     "packages//DGEobj_0.9.39.tar.gz"    
[3] "packages//DGEobj_0.9.42.tar.gz"     "packages//DGE.Tools2_0.9.76.tar.gz"
[5] "packages//DGE.Tools2_0.9.78.tar.gz" "packages//JRTutil_0.9.38.tar.gz"   

files <- list.files('packages/', full = TRUE)

all(file.exists(files))
[1] TRUE
ADD REPLYlink modified 7 months ago • written 7 months ago by Kevin Blighe66k

Okay, so in the first step after setting the path to directory containing quant.sf, list.files() just gives the names of the folders in that directories with one folder having quant.sf file.

ADD REPLYlink written 7 months ago by evelyn110

Do you know how Salmon output looks like? Each folder is a library with a quant.sf file. You should list the files in the head directory (that contains all the Salmon results) and work from there. Make sure you don't mix dirs. Take your time, you'll figure it out eventually.

ADD REPLYlink written 7 months ago by Asaf8.4k

Okay. I ran salmon for multiple files together and have one quant.sf. Do I need to run it individually for each paired end fastq file.

ADD REPLYlink written 7 months ago by evelyn110
1

No, you should run it separately for each sample, it may have a few files if it was sequenced in different lanes.

ADD REPLYlink written 7 months ago by Asaf8.4k
1

Side note: use read.csv2 when we have ";" as a separator.

ADD REPLYlink written 7 months ago by zx87549.7k

The error implies that the length of files is not equal to 100. So, please check the contents of the files variable.

ADD REPLYlink written 7 months ago by Kevin Blighe66k
3
gravatar for ATpoint
7 months ago by
ATpoint40k
Germany
ATpoint40k wrote:

I always do it like this:

1) Put all the directories that salmon produced (the ones that contain the quant.sf files) into a directory. I personally always call this salmons.

2) Once you moved all these directories into the salmons directory you enter this directory from within R:

setwd("path/to/salmons/")

3) You list the quant.sf files, give each file a name and then run tximport:

my.files <-  list.files(list.dirs(path = ".", full.names = TRUE, recursive = FALSE), pattern = "quant.sf", full.names = TRUE)

> [1] "./B_rep1_salmon/quant.sf"   "./B_rep2_salmon/quant.sf"

## You give the samples a name, I always use the directory name:
names(my.files) <- sapply(strsplit(my.files, split="\\/"), function(x)x[2])

> head(my.files, 6)
B_rep1_salmon                B_rep2_salmon             
"./B_rep1_salmon/quant.sf"   "./B_rep2_salmon/quant.sf"

## Now run tximport. Done.
txi <- tximport(files = my.files, type = "salmon", tx2gene = tx2gene)
ADD COMMENTlink modified 7 months ago • written 7 months ago by ATpoint40k

Thank you! As I said above I think there might be a problem as I ran salmon for multiple files together and have only one quant.sf. Now I am trying to run an array to produce different quant.sf files for each pair of files.

ADD REPLYlink written 7 months ago by evelyn110

I finally got

>all(file.exists(my.files))
TRUE

Now I am trying to create tx2gene using:
TxDb <- makeTxDbFromGFF(file = "/path/gene_models.gff")
k <- keys(TxDb, keytype = "TXNAME")
tx2gene <- select(TxDb, k, "GENEID", "TXNAME")
head(tx2gene)

But I got error:

> tx2gene <- select(TxDb, k, "GENEID", "TXNAME")
Error in UseMethod("select_") : 
  no applicable method for 'select_' applied to an object of class "c('TxDb', 'AnnotationDb', 'envRefClass', '.environment', 'refClass', 'environment', 'refObject', 'AssayData')"
ADD REPLYlink written 7 months ago by evelyn110
1

select was probably run-over by tidyverse. Try using AnnotationDbi::select directly

ADD REPLYlink written 7 months ago by Asaf8.4k

Thank you! I did it and got:

'select()' returned 1:1 mapping between keys and columns

Is it normal?

ADD REPLYlink written 7 months ago by evelyn110

To get last directory name:

my.files <- c("./B_rep1_salmon/quant.sf", "./B_rep2_salmon/quant.sf")
basename(dirname(my.files))
# [1] "B_rep1_salmon" "B_rep2_salmon"
ADD REPLYlink written 7 months ago by zx87549.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1908 users visited in the last hour