Salmon quant with netflow nf-core/rnaseq -r 3.13.2
1
1
Entering edit mode
3 months ago
Hojn ▴ 20

Dear community,

I am using this nextflow in order to produce count matrices from Smart-Seq data to analyze it further in Seurat. I went through documentations (Salmon/Nextflow) but couldn't really find what I wanted, so I give a try here.

First, I didn't really find a pipeline from SmartSeq to Seurat, if you know one with critical parameters to look at, I would appreciate. Second, I have hard time figuring out what are the outputs of traditionnal Star - Salmon, I mean how they are produced. I end with 3 files for Salmon quantifications (salmon.merged.gene_counts, salmon.merged.gene_counts_length_scaled and salmon.merged.gene_counts_scaled).

What are they, how were they produced? I currently use the first one since it does not seem like normalized, but I do have digits in gene_counts and I just want to know why (I guess it's about the quantification method?) and wonder if it's correct to use it that way.

Should I use the length_scaled one? What are differences between length_scaled and scaled?

Thanks in advance for your time

Salmon nextflow • 455 views
ADD COMMENT
1
Entering edit mode
3 months ago
Barry Digby ★ 1.3k

I have hard time figuring out what are the outputs of traditional Star - Salmon, I mean how they are produced.

Excerpt from: https://nf-co.re/rnaseq/3.12.0/docs/output#star-and-salmon

All you need to run Salmon is a FASTA file containing your reference transcripts and a set of FASTA/FASTQ/BAM file(s) containing your reads. The transcriptome-level BAM files generated by STAR are provided to Salmon for downstream quantification.

Regarding the three files (https://nf-co.re/rnaseq/3.12.0/docs/output#salmon)

According to the txtimport documentation you can do one of the following:

  • Use bias corrected counts with an offset: import all the salmon files with tximport and then use DESeq2 with dds <- DESeqDataSetFromTximport(txi, sampleTable, ~condition) to correct for changes to the average transcript length across samples.
  • Use bias corrected counts without an offset: load and use salmon.merged.gene_counts_length_scaled.tsv or salmon.merged.gene_counts_scaled.tsv directly as you would with a regular counts matrix.
  • Use bias uncorrected counts: load and use the txi$counts matrix (or salmon.merged.gene_counts.tsv) with DESeq2. This does not correct for potential differential isoform usage. Alternatively, if you have 3’ tagged RNA-seq data this is the most suitable method.

* When they say "import all of the salmon files", I assume this means the quant.sf files generated for each sample. See here: https://bioconductor.org/packages/devel/bioc/vignettes/tximport/inst/doc/tximport.html#Salmon

ADD COMMENT

Login before adding your answer.

Traffic: 1692 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6