I'm trying to do RNA-seq analysis using salmon and would like to have a matrix of read counts of 10 RNA fastq files. I installed salmon with bioconda, however, I can only find version : 0.8.1 even after '
conda update salmon'. So I have been doing with version 0.8.1 and used the code below for indexing and mapping.
salmon index -t gencode.v31.transcripts.fa.gz -i gencode_v31_idx salmon quant -i gencode_v31_idx/ -l IU -p 10 -1 ERR22788_1.fastq.gz -2 ERR22788_2.fastq.gz -o results/ERR2278846
Then, I got 10 quant.sf files from each RNA-seq fastq files. Now I'm trying to import and summarize them using tximport in R and don't know how to do it. I've been trying to follow the codes of the blogs like https://bioconductor.org/packages/release/bioc/vignettes/tximport/inst/doc/tximport.html#salmon-sailfish and https://www.hadriengourle.com/tutorials/rna/
When I tried the code below,
txdb <- makeTxDbFromGFF("gencode.v31.primary_assembly.annotation.gtf")
I got the warning message:
Import genomic features from the file as a GRanges object ... OK Prepare the 'metadata' data frame ... OK Make the TxDb object ... OK Warning message: In .get_cds_IDX(mcols0$type, mcols0$phase) : The "phase" metadata column contains non-NA values for features of type stop_codon. This information was ignored.
How can I fix this? And I saw the tutorials of the blog are using 'tximportdata' library and getting the same results which have 6 samples. I guess the process has 2 big steps as below. How can I modify the code for my use?
#https://www.hadriengourle.com/tutorials/rna/ #link the transcript names to the gene names txdb <- makeTxDbFromGFF("chr22_genes.gtf") k <- keys(txdb, keytype = "GENEID") tx2gene <- select(txdb, keys = k, keytype = "GENEID", columns = "TXNAME") head(tx2gene) #import the salmon quantification amples <- read.table("samples.txt", header = TRUE) files <- file.path("quant", samples$sample, "quant.sf") names(files) <- paste0(samples$sample) txi.salmon <- tximport(files, type = "salmon", tx2gene = tx2gene)
I have 10 quant.sf files at each folder of 'sample names' and used gencode.v31.transcripts.fa file as a reference.
I would really appreciate your help!