Question

rna-seq analysis with Salmon - how to Import and summarize using tximport

0

Entering edit mode

4.9 years ago

woojoy14 ▴ 10

Hi!

I'm trying to do RNA-seq analysis using salmon and would like to have a matrix of read counts of 10 RNA fastq files. I installed salmon with bioconda, however, I can only find version : 0.8.1 even after 'conda update salmon'. So I have been doing with version 0.8.1 and used the code below for indexing and mapping.

salmon index -t gencode.v31.transcripts.fa.gz -i gencode_v31_idx
salmon quant -i gencode_v31_idx/ -l IU -p 10 -1 ERR22788_1.fastq.gz -2 ERR22788_2.fastq.gz -o results/ERR2278846

Then, I got 10 quant.sf files from each RNA-seq fastq files. Now I'm trying to import and summarize them using tximport in R and don't know how to do it. I've been trying to follow the codes of the blogs like https://bioconductor.org/packages/release/bioc/vignettes/tximport/inst/doc/tximport.html#salmon-sailfish and https://www.hadriengourle.com/tutorials/rna/

When I tried the code below,

txdb <- makeTxDbFromGFF("gencode.v31.primary_assembly.annotation.gtf")

I got the warning message:

Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
Warning message:
In .get_cds_IDX(mcols0$type, mcols0$phase) :
  The "phase" metadata column contains non-NA values for
  features of type stop_codon. This information was
  ignored.

How can I fix this? And I saw the tutorials of the blog are using 'tximportdata' library and getting the same results which have 6 samples. I guess the process has 2 big steps as below. How can I modify the code for my use?

#https://www.hadriengourle.com/tutorials/rna/
#link the transcript names to the gene names
txdb <- makeTxDbFromGFF("chr22_genes.gtf")
k <- keys(txdb, keytype = "GENEID")
tx2gene <- select(txdb, keys = k, keytype = "GENEID", columns = "TXNAME")
head(tx2gene)

#import the salmon quantification
amples <- read.table("samples.txt", header = TRUE)
files <- file.path("quant", samples$sample, "quant.sf")
names(files) <- paste0(samples$sample)
txi.salmon <- tximport(files, type = "salmon", tx2gene = tx2gene)

I have 10 quant.sf files at each folder of 'sample names' and used gencode.v31.transcripts.fa file as a reference.

I would really appreciate your help!

RNA-Seq salmon tximport • 3.6k views

ADD COMMENT • link updated 9 months ago by camillab. ▴ 160 • written 4.9 years ago by woojoy14 ▴ 10

1

Entering edit mode

It's just a warning, it shouldn't really impact your workflow.

As for the old version issues, your conda is probably quite out of date. You can also just install the binary easily enough. Just unpack it somewhere and add the bin to your PATH.

ADD REPLY • link 4.9 years ago by jared.andrews07 ★ 17k

0

Entering edit mode

I ran the code above and got the results. I posted a question for the results on https://support.bioconductor.org/p/124142/.

ADD REPLY • link 4.9 years ago by woojoy14 ▴ 10