Help with Salmon --> tximport --> edgeR
0
0
Entering edit mode
4.0 years ago
u3005992 ▴ 20

Hi all,

I am new to RNA-seq analysis. Currently, I am trying to use the salmon, tximport, edgeR pipeline to process my human RNA-seq results on galaxy. The cDNA library for my RNA-seq is generated from PolyA selection.

I am abit confused with the normlisation steps.

  1. For salmon, i have aligned my reads to the human transcriptome, and used the human gff file for quant.genes.sf output, however, the TPM are still annotated with ENST00000XXXXXX.X instead of ENSGXXXXXXXXXXX. Does that mean salmon failed to recognise the GFF file and my TPM number is still for transcripts and not genes?

  2. If salmon failed to produce the correct quant.genes.sf files, I would like to use tximport to aggregate my transcripts to genes with my quant.sf files. But I come across 4 options in tximport for "Summarization using the abundance (TPM) values?"------ i) No, ii) scaled up to library size, iii) scaled using the avg. transcript length over samples and then the library size, iv) scaled using the median transcript length among isoforms of a gene, and then library size.

Which option should I be using if I want to follow up with edgeR on degust? Will I "overnormalised" my results if I choose the wrong option to go with edgeR?

Any help would be appreciated. Many thanks in advance!

James

RNA-Seq tximport salmon edgeR normalisation • 3.1k views
ADD COMMENT
0
Entering edit mode

If you already ran salmon on transcript level there is no need anymore to provide it with a gff files of genome annotations for human (will not even work I think).

You can safely continue to tximport who will do the summarisation on gene level.

One thing you might consider doing is to use a transcriptome version with one transcript per locus?

ADD REPLY
0
Entering edit mode

lieven.sterck : which file of deseq2 we need to give as input to tximport and which gtf file.we.need to provide?

ADD REPLY
0
Entering edit mode

don't know exactly the name of that file but the one with the counts in it (tabular format file, with number of columns, among which one that is called TPM I think).

For the GTF, the one that links all transcripts to it's locus (== where one can determine which isoforms are from the same gene locus)

ADD REPLY

Login before adding your answer.

Traffic: 1663 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6