Question: Salmon (quant.sf) to txtimport
1
gravatar for Morris_Chair
12 months ago by
Morris_Chair170
Morris_Chair170 wrote:

Hello,

I generated quant.sf files with salmon, and the next step is to create a transcript-to-gene matching table (tx2gene) to aggregate transcripts to gene level,

the first step is made by this command lines

txdb <-makeTxDbFromGFF("gencode.v19.annotation.gtf")
k <- keys(txdb, keytype = "GENEID")
df <- select(txdb, keys = k,  columns = "TXNAME", keytype = "GENEID")
tx2gene <- df[, 2:1]
head(tx2gene)

from where I get this

       TXNAME             GENEID
1 ENST00000373020.4 ENSG00000000003.10
2 ENST00000496771.1 ENSG00000000003.10
3 ENST00000494424.1 ENSG00000000003.10
4 ENST00000373031.4  ENSG00000000005.5
5 ENST00000485971.1  ENSG00000000005.5
6 ENST00000371588.5  ENSG00000000419.8

Next, I have to load the quant.sf file into R

files <- list.files( pattern = "quant.sf",full.names = TRUE)
names(files) <- paste0("sample", 1:6)
all(file.exists(files))
#TRUE

I didn't understand how should I organize my quant.sf file before using this command? Can I change the quant.sf name so I can distiguish which one belongs to, can I put together altogether in a folder and when I type files <- list.files( pattern = "quant.sf" they all get picked?

thanks a lot

rna-seq R • 927 views
ADD COMMENTlink modified 12 months ago • written 12 months ago by Morris_Chair170
5
gravatar for Rob
12 months ago by
Rob3.7k
United States
Rob3.7k wrote:

Hi Morris,

In general, you should not modify the structure of the salmon directories as tximport relies on it. The easiest thing to do is to have a separate quant directory for each sample, and to provide those each to tximport using whatever naming convention you want for the quant folders. The line names(files) <- paste0("sample", 1:6) simply provides names to the files that you want in the order they exist in the files vector. It's common, for example, to have (create) a tab separated file or some such that would map each output folder name to the name you wish to give the sample in your R analysis. Then, you could consult that file to both read all of the samples in, as well as to give those samples associated names in your analysis. Honestly, the bioconductor forum is an ideal place for questions like the above and Mike is likely to answer any tximport-related question quickly :).

ADD COMMENTlink written 12 months ago by Rob3.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 881 users visited in the last hour