Question

Transcript differential exprs analysis using Kallisto not significant

0

Entering edit mode

6.0 years ago

bharata1803 ▴ 560

Hello,

So I performed both gene level and transcript level expression analysis. I have 7 samples (matching) of normal and cancer. For gene level analysis, I use Salmon to both (pseudo)align and quantify readcount. I then use DESeq2 library from Bioconductor.

For gene level analysis, I follow Kallisto workflow and I got the beta value. I already modified the parameter to use log 2 so that the beta value can be interpreted as log fold change.

After comparing the result, I noticed that many transcript are not giving significant result while gene that are related to those transcript are found to be differentially significant.

Inspecting the data, I noticed that each transcript readcount variance are quite big bit if using gene level which accummulate the readcount frm all transcript per gene the variance are not that big. That is why on gene level, I can found which gene are found to be significantly different. Compare to transcript level, I cannot get significant result so I don't know which transcript are differentially expressed.

My target is I want to distinguish which transcript are differentially expressed. I noticed that some genes, while having many transcript, not all of the transcripts are being expressed.

My question is, is there any way to handle this insignificant result?

I haven't tried HISAT2+StringTie+Ballgown workflow though. Anyone can share their experience using this workflow to be useful?

RNA-Seq • 2.7k views

ADD COMMENT • link updated 6.0 years ago by Kristoffer Vitting-Seerup ★ 4.1k • written 6.0 years ago by bharata1803 ▴ 560

0

Entering edit mode

Just to ensure I understand your problem correctly: You are asking why many genes are differentially expressed but you cannot say which of the underlying transcripts are the "responsible" for this change?

ADD REPLY • link 6.0 years ago by Kristoffer Vitting-Seerup ★ 4.1k

0

Entering edit mode

Yes, that is correct.To be precise, I want to investigate which transcript cause the up/down regulation of the gene expression in disease compare to control. I am thinking the logFC of transcript diff. exprs. analysis would be a weight of how a transcript expression affect overall gene expression.

ADD REPLY • link 6.0 years ago by bharata1803 ▴ 560

score 2 · Answer 1 · 2018-11-08

2

Entering edit mode

6.0 years ago

Kristoffer Vitting-Seerup ★ 4.1k

You could try an alternative DE pipeline. An example could be to modify this to do isoform-level DE.

In short this involves importing Kallisto results into R using tximport not summarizing to gene-level and then doing DE with DESeq2.

I would not use Ballgown. I have never seen it perform well (except for their papr) and it is just a wrapper pushing FPKM values into limma where it is well known that FPKM values should not be used for DE.

ADD COMMENT • link 6.0 years ago by Kristoffer Vitting-Seerup ★ 4.1k

0

Entering edit mode

Yes aI am thinking to use DESeq2 and combine it with Salmon/Kallisto readcount result.

ADD REPLY • link 6.0 years ago by bharata1803 ▴ 560

0

Entering edit mode

Remember to use tximport() to get the data into R - the scaling is important :-)

ADD REPLY • link 6.0 years ago by Kristoffer Vitting-Seerup ★ 4.1k

score 1 · Answer 2 · 2018-11-08

1

Entering edit mode

6.0 years ago

Kristoffer Vitting-Seerup ★ 4.1k

The reason why a gene can be differentially expressed without any of the underlying isoforms being differentially expressed is quite simply that the power to detect the change for an individual isoform can be to low.

Lets consider an hypothetical gene with X isoforms expressed in two conditions with the following number of reads:

          Cond1   Cond2    Change
Iso1:         1       3         2
Iso2:         2       3         1
...         ...     ...       ...
IsoX:         2       2         0

Since the change in each isoform is quite small the associated uncertainty of whether it it is a "true" change is quite large whereby each isoform by itself is not signifcant.

The case is however quite different when you aggregate to the gene level - where from the example above could result in the following counts:

          Cond1   Cond2    Change
Gene:       10       20       10

Such a large difference is associated with a small uncertainty whereby we can say it is significantly differentially expressed.

Hope this helps. Kristoffer

Ps. why are you interested in differential isoform expression? Would differential gene expression and differential isoform usage not be more suitable?

ADD COMMENT • link 6.0 years ago by Kristoffer Vitting-Seerup ★ 4.1k

0

Entering edit mode

well generally I know why transcript has low power. i want to find a way to handle this problem.

the reason why i am interested is related to transcription factor. I what TF regulates are not gene but transcript and transcript also translate into protein. I cannot go into detail here because to be honest I am still working on it. But, I think that understanding of transcript expression and their correlation to protein expression can show more information for building transcription factor network.

ADD REPLY • link 6.0 years ago by bharata1803 ▴ 560

1

Entering edit mode

The only ways of increasing the power is to : 1) Sequence (much) deeper 2) Aggregate transcripts (fx those with same transcription start site, same ORF etc) 3) Potentially one can do somthing Transcript Compatibility Counts (TCCs), which can be calculated by Kallisto as described here.

ADD REPLY • link 6.0 years ago by Kristoffer Vitting-Seerup ★ 4.1k