Question: Differential analysis isoform level or gene level ?
0
gravatar for agtbeeman
7 weeks ago by
agtbeeman0
agtbeeman0 wrote:

Hello all,

I am new to bionformatics and working on a project and my mission is : getting the first reference transcriptom of a specie and perform differential analysis on 2 temperature conditions at isoform level on deseq2. And I have a few questions about methodology.

So far I have a reference transcritome ( I did filter my Trinity fasta according to quality redundancy and also according to transcript expression).

I am concerned it seems not recommanded to perform diffential analysis at isoform level (https://support.bioconductor.org/p/43395/#43400)

So I am wondering wether I should change tools to perform isoform level analysis, or if it is better to do a differential analysis at gene level. Also I wonder if Ihave to cluster my transcripts (using tools like corset), prior to count, since kallisto only gives count at transcript level, unless deseq2 can use the transcript id to cluster them into genes ?

And also now that I am thinking about doing an analysis at gene level I am concerned wheter my filtering according to transcript expression will skew my analysis.

Thank you for reading !

rna-seq sequence alignment gene • 286 views
ADD COMMENTlink modified 7 weeks ago • written 7 weeks ago by agtbeeman0
2

Please use full words - level, not lvl. Smalll things like these are the difference between being a professional and not being one.

ADD REPLYlink written 7 weeks ago by _r_am32k
2

I have no idea what you're doing for your reference transcriptome (language is very unclear).

But to do gene-level analysis with DESeq2, you have to summarize the transcript-level estimates to gene-level (see: tximport).

If you want to do transcript-level differential expression analysis, I'd recommend using sleuth (note: sleuth can also do gene-level analysis).

ADD REPLYlink written 7 weeks ago by dsull1.8k

Ok thanks, sorry for being so unclear, I have just edited my post to make it better.

I have decided to do gene-level analysis on deseq2. So far I have followed the documentation. My transcripts id looks like this :TRINITY_DN0_c0_g1_i2. I am not sure it is the right thing but I create my tx2gene table like this

    Transcript_id                Gene_id
  1 TRINITY_DN80838_c0_g1_i1 TRINITY_DN80838_c0_g1_
  2 TRINITY_DN80873_c0_g1_i1 TRINITY_DN80873_c0_g1_
  3 TRINITY_DN80855_c0_g1_i2 TRINITY_DN80855_c0_g1_

And when I look at my final count matrix, it contains for each gene the sum of all isoforms estimated counts, is it normal?

ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by agtbeeman0

yes, if you use tximport, it actually sum all isoform counts from their gene as gene-level count.

ADD REPLYlink written 7 weeks ago by xiaoguang50

Ok thanks ! I just found it a bit surprising, I would have expected it took into account some other data such as isoform length for instance

ADD REPLYlink written 7 weeks ago by agtbeeman0

I think some other methods like genome aligned based can get the accurate expression count of gene-level. such as subread+featureCount?

ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by xiaoguang50

It's actually more accurate to get gene-level expression from transcript-level estimates.

Many papers have been written on this e.g.: Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences (this is the tximport paper)

ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by dsull1.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1707 users visited in the last hour
_