Question: rna-seq analysis with Salmon - how to Import and summarize using tximport
0
gravatar for woojoy14
7 months ago by
woojoy140
woojoy140 wrote:

Hi!

I'm trying to do RNA-seq analysis using salmon and would like to have a matrix of read counts of 10 RNA fastq files. I installed salmon with bioconda, however, I can only find version : 0.8.1 even after 'conda update salmon'. So I have been doing with version 0.8.1 and used the code below for indexing and mapping.

salmon index -t gencode.v31.transcripts.fa.gz -i gencode_v31_idx
salmon quant -i gencode_v31_idx/ -l IU -p 10 -1 ERR22788_1.fastq.gz -2 ERR22788_2.fastq.gz -o results/ERR2278846

Then, I got 10 quant.sf files from each RNA-seq fastq files. Now I'm trying to import and summarize them using tximport in R and don't know how to do it. I've been trying to follow the codes of the blogs like https://bioconductor.org/packages/release/bioc/vignettes/tximport/inst/doc/tximport.html#salmon-sailfish and https://www.hadriengourle.com/tutorials/rna/

When I tried the code below,

txdb <- makeTxDbFromGFF("gencode.v31.primary_assembly.annotation.gtf")

I got the warning message:

Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
Warning message:
In .get_cds_IDX(mcols0$type, mcols0$phase) :
  The "phase" metadata column contains non-NA values for
  features of type stop_codon. This information was
  ignored.

How can I fix this? And I saw the tutorials of the blog are using 'tximportdata' library and getting the same results which have 6 samples. I guess the process has 2 big steps as below. How can I modify the code for my use?

#https://www.hadriengourle.com/tutorials/rna/
#link the transcript names to the gene names
txdb <- makeTxDbFromGFF("chr22_genes.gtf")
k <- keys(txdb, keytype = "GENEID")
tx2gene <- select(txdb, keys = k, keytype = "GENEID", columns = "TXNAME")
head(tx2gene)

#import the salmon quantification
amples <- read.table("samples.txt", header = TRUE)
files <- file.path("quant", samples$sample, "quant.sf")
names(files) <- paste0(samples$sample)
txi.salmon <- tximport(files, type = "salmon", tx2gene = tx2gene)

I have 10 quant.sf files at each folder of 'sample names' and used gencode.v31.transcripts.fa file as a reference.

I would really appreciate your help!

rna-seq salmon tximport • 504 views
ADD COMMENTlink written 7 months ago by woojoy140
1

It's just a warning, it shouldn't really impact your workflow.

As for the old version issues, your conda is probably quite out of date. You can also just install the binary easily enough. Just unpack it somewhere and add the bin to your PATH.

ADD REPLYlink written 7 months ago by jared.andrews075.3k

I ran the code above and got the results. I posted a question for the results on https://support.bioconductor.org/p/124142/.

ADD REPLYlink written 7 months ago by woojoy140
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1202 users visited in the last hour