Question: Salmon (quant.sf) to txtimport
1
gravatar for Morris_Chair
3 months ago by
Morris_Chair120
Morris_Chair120 wrote:

Hello,

I generated quant.sf files with salmon, and the next step is to create a transcript-to-gene matching table (tx2gene) to aggregate transcripts to gene level,

the first step is made by this command lines

txdb <-makeTxDbFromGFF("gencode.v19.annotation.gtf")
k <- keys(txdb, keytype = "GENEID")
df <- select(txdb, keys = k,  columns = "TXNAME", keytype = "GENEID")
tx2gene <- df[, 2:1]
head(tx2gene)

from where I get this

       TXNAME             GENEID
1 ENST00000373020.4 ENSG00000000003.10
2 ENST00000496771.1 ENSG00000000003.10
3 ENST00000494424.1 ENSG00000000003.10
4 ENST00000373031.4  ENSG00000000005.5
5 ENST00000485971.1  ENSG00000000005.5
6 ENST00000371588.5  ENSG00000000419.8

Next, I have to load the quant.sf file into R

files <- list.files( pattern = "quant.sf",full.names = TRUE)
names(files) <- paste0("sample", 1:6)
all(file.exists(files))
#TRUE

I didn't understand how should I organize my quant.sf file before using this command? Can I change the quant.sf name so I can distiguish which one belongs to, can I put together altogether in a folder and when I type files <- list.files( pattern = "quant.sf" they all get picked?

thanks a lot

rna-seq R • 335 views
ADD COMMENTlink modified 3 months ago • written 3 months ago by Morris_Chair120
5
gravatar for Rob
3 months ago by
Rob3.4k
United States
Rob3.4k wrote:

Hi Morris,

In general, you should not modify the structure of the salmon directories as tximport relies on it. The easiest thing to do is to have a separate quant directory for each sample, and to provide those each to tximport using whatever naming convention you want for the quant folders. The line names(files) <- paste0("sample", 1:6) simply provides names to the files that you want in the order they exist in the files vector. It's common, for example, to have (create) a tab separated file or some such that would map each output folder name to the name you wish to give the sample in your R analysis. Then, you could consult that file to both read all of the samples in, as well as to give those samples associated names in your analysis. Honestly, the bioconductor forum is an ideal place for questions like the above and Mike is likely to answer any tximport-related question quickly :).

ADD COMMENTlink written 3 months ago by Rob3.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 892 users visited in the last hour