Question

Salmon (quant.sf) to txtimport

2

Entering edit mode

5.1 years ago

Morris_Chair ▴ 350

Hello,

I generated quant.sf files with salmon, and the next step is to create a transcript-to-gene matching table (tx2gene) to aggregate transcripts to gene level,

the first step is made by this command lines

txdb <-makeTxDbFromGFF("gencode.v19.annotation.gtf")
k <- keys(txdb, keytype = "GENEID")
df <- select(txdb, keys = k,  columns = "TXNAME", keytype = "GENEID")
tx2gene <- df[, 2:1]
head(tx2gene)

from where I get this

       TXNAME             GENEID
1 ENST00000373020.4 ENSG00000000003.10
2 ENST00000496771.1 ENSG00000000003.10
3 ENST00000494424.1 ENSG00000000003.10
4 ENST00000373031.4  ENSG00000000005.5
5 ENST00000485971.1  ENSG00000000005.5
6 ENST00000371588.5  ENSG00000000419.8

Next, I have to load the quant.sf file into R

files <- list.files( pattern = "quant.sf",full.names = TRUE)
names(files) <- paste0("sample", 1:6)
all(file.exists(files))
#TRUE

I didn't understand how should I organize my quant.sf file before using this command? Can I change the quant.sf name so I can distiguish which one belongs to, can I put together altogether in a folder and when I type files <- list.files( pattern = "quant.sf" they all get picked?

thanks a lot

RNA-Seq R • 5.0k views

ADD COMMENT • link 5.1 years ago by Morris_Chair ▴ 350

score 6 · Answer 1 · 2019-03-24

Hi Morris,

In general, you should not modify the structure of the salmon directories as tximport relies on it. The easiest thing to do is to have a separate quant directory for each sample, and to provide those each to tximport using whatever naming convention you want for the quant folders. The line names(files) <- paste0("sample", 1:6) simply provides names to the files that you want in the order they exist in the files vector. It's common, for example, to have (create) a tab separated file or some such that would map each output folder name to the name you wish to give the sample in your R analysis. Then, you could consult that file to both read all of the samples in, as well as to give those samples associated names in your analysis. Honestly, the bioconductor forum is an ideal place for questions like the above and Mike is likely to answer any tximport-related question quickly :).