rna-seq analysis with Salmon - how to Import and summarize using tximport
Entering edit mode
4.8 years ago
woojoy14 ▴ 10


I'm trying to do RNA-seq analysis using salmon and would like to have a matrix of read counts of 10 RNA fastq files. I installed salmon with bioconda, however, I can only find version : 0.8.1 even after 'conda update salmon'. So I have been doing with version 0.8.1 and used the code below for indexing and mapping.

salmon index -t gencode.v31.transcripts.fa.gz -i gencode_v31_idx
salmon quant -i gencode_v31_idx/ -l IU -p 10 -1 ERR22788_1.fastq.gz -2 ERR22788_2.fastq.gz -o results/ERR2278846

Then, I got 10 quant.sf files from each RNA-seq fastq files. Now I'm trying to import and summarize them using tximport in R and don't know how to do it. I've been trying to follow the codes of the blogs like https://bioconductor.org/packages/release/bioc/vignettes/tximport/inst/doc/tximport.html#salmon-sailfish and https://www.hadriengourle.com/tutorials/rna/

When I tried the code below,

txdb <- makeTxDbFromGFF("gencode.v31.primary_assembly.annotation.gtf")

I got the warning message:

Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
Warning message:
In .get_cds_IDX(mcols0$type, mcols0$phase) :
  The "phase" metadata column contains non-NA values for
  features of type stop_codon. This information was

How can I fix this? And I saw the tutorials of the blog are using 'tximportdata' library and getting the same results which have 6 samples. I guess the process has 2 big steps as below. How can I modify the code for my use?

#link the transcript names to the gene names
txdb <- makeTxDbFromGFF("chr22_genes.gtf")
k <- keys(txdb, keytype = "GENEID")
tx2gene <- select(txdb, keys = k, keytype = "GENEID", columns = "TXNAME")

#import the salmon quantification
amples <- read.table("samples.txt", header = TRUE)
files <- file.path("quant", samples$sample, "quant.sf")
names(files) <- paste0(samples$sample)
txi.salmon <- tximport(files, type = "salmon", tx2gene = tx2gene)

I have 10 quant.sf files at each folder of 'sample names' and used gencode.v31.transcripts.fa file as a reference.

I would really appreciate your help!

RNA-Seq salmon tximport • 3.6k views
Entering edit mode

It's just a warning, it shouldn't really impact your workflow.

As for the old version issues, your conda is probably quite out of date. You can also just install the binary easily enough. Just unpack it somewhere and add the bin to your PATH.

Entering edit mode

I ran the code above and got the results. I posted a question for the results on https://support.bioconductor.org/p/124142/.


Login before adding your answer.

Traffic: 1635 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6