Question

Help with Salmon --> tximport --> edgeR

0

Entering edit mode

4.0 years ago

u3005992 ▴ 20

Hi all,

I am new to RNA-seq analysis. Currently, I am trying to use the salmon, tximport, edgeR pipeline to process my human RNA-seq results on galaxy. The cDNA library for my RNA-seq is generated from PolyA selection.

I am abit confused with the normlisation steps.

For salmon, i have aligned my reads to the human transcriptome, and used the human gff file for quant.genes.sf output, however, the TPM are still annotated with ENST00000XXXXXX.X instead of ENSGXXXXXXXXXXX. Does that mean salmon failed to recognise the GFF file and my TPM number is still for transcripts and not genes?
If salmon failed to produce the correct quant.genes.sf files, I would like to use tximport to aggregate my transcripts to genes with my quant.sf files. But I come across 4 options in tximport for "Summarization using the abundance (TPM) values?"------ i) No, ii) scaled up to library size, iii) scaled using the avg. transcript length over samples and then the library size, iv) scaled using the median transcript length among isoforms of a gene, and then library size.

Which option should I be using if I want to follow up with edgeR on degust? Will I "overnormalised" my results if I choose the wrong option to go with edgeR?

Any help would be appreciated. Many thanks in advance!

James

RNA-Seq tximport salmon edgeR normalisation • 3.1k views

ADD COMMENT • link updated 2.4 years ago by lieven.sterck 15k • written 4.0 years ago by u3005992 ▴ 20

0

Entering edit mode

If you already ran salmon on transcript level there is no need anymore to provide it with a gff files of genome annotations for human (will not even work I think).

You can safely continue to tximport who will do the summarisation on gene level.

One thing you might consider doing is to use a transcriptome version with one transcript per locus?

ADD REPLY • link 4.0 years ago by lieven.sterck 15k

0

Entering edit mode

lieven.sterck : which file of deseq2 we need to give as input to tximport and which gtf file.we.need to provide?

ADD REPLY • link 2.4 years ago by pragathi.sneha91 • 0

0

Entering edit mode

don't know exactly the name of that file but the one with the counts in it (tabular format file, with number of columns, among which one that is called TPM I think).

For the GTF, the one that links all transcripts to it's locus (== where one can determine which isoforms are from the same gene locus)

ADD REPLY • link 2.4 years ago by lieven.sterck 15k