Question: Transcript differential exprs analysis using Kallisto not significant
0
gravatar for bharata1803
11 days ago by
bharata1803380
Japan
bharata1803380 wrote:

Hello,

So I performed both gene level and transcript level expression analysis. I have 7 samples (matching) of normal and cancer. For gene level analysis, I use Salmon to both (pseudo)align and quantify readcount. I then use DESeq2 library from Bioconductor.

For gene level analysis, I follow Kallisto workflow and I got the beta value. I already modified the parameter to use log 2 so that the beta value can be interpreted as log fold change.

After comparing the result, I noticed that many transcript are not giving significant result while gene that are related to those transcript are found to be differentially significant.

Inspecting the data, I noticed that each transcript readcount variance are quite big bit if using gene level which accummulate the readcount frm all transcript per gene the variance are not that big. That is why on gene level, I can found which gene are found to be significantly different. Compare to transcript level, I cannot get significant result so I don't know which transcript are differentially expressed.

My target is I want to distinguish which transcript are differentially expressed. I noticed that some genes, while having many transcript, not all of the transcripts are being expressed.

My question is, is there any way to handle this insignificant result?

I haven't tried HISAT2+StringTie+Ballgown workflow though. Anyone can share their experience using this workflow to be useful?

rna-seq • 102 views
ADD COMMENTlink modified 11 days ago by kristoffer.vittingseerup830 • written 11 days ago by bharata1803380

Just to ensure I understand your problem correctly: You are asking why many genes are differentially expressed but you cannot say which of the underlying transcripts are the "responsible" for this change?

ADD REPLYlink written 11 days ago by kristoffer.vittingseerup830

Yes, that is correct.To be precise, I want to investigate which transcript cause the up/down regulation of the gene expression in disease compare to control. I am thinking the logFC of transcript diff. exprs. analysis would be a weight of how a transcript expression affect overall gene expression.

ADD REPLYlink modified 11 days ago • written 11 days ago by bharata1803380
2
gravatar for kristoffer.vittingseerup
11 days ago by
European Union
kristoffer.vittingseerup830 wrote:

You could try an alternative DE pipeline. An example could be to modify this to do isoform-level DE.

In short this involves importing Kallisto results into R using tximport not summarizing to gene-level and then doing DE with DESeq2.

I would not use Ballgown. I have never seen it perform well (except for their papr) and it is just a wrapper pushing FPKM values into limma where it is well known that FPKM values should not be used for DE.

ADD COMMENTlink written 11 days ago by kristoffer.vittingseerup830

Yes aI am thinking to use DESeq2 and combine it with Salmon/Kallisto readcount result.

ADD REPLYlink written 11 days ago by bharata1803380

Remember to use tximport() to get the data into R - the scaling is important :-)

ADD REPLYlink written 10 days ago by kristoffer.vittingseerup830
1
gravatar for kristoffer.vittingseerup
11 days ago by
European Union
kristoffer.vittingseerup830 wrote:

The reason why a gene can be differentially expressed without any of the underlying isoforms being differentially expressed is quite simply that the power to detect the change for an individual isoform can be to low.

Lets consider an hypothetical gene with X isoforms expressed in two conditions with the following number of reads:

          Cond1   Cond2    Change
Iso1:         1       3         2
Iso2:         2       3         1
...         ...     ...       ...
IsoX:         2       2         0

Since the change in each isoform is quite small the associated uncertainty of whether it it is a "true" change is quite large whereby each isoform by itself is not signifcant.

The case is however quite different when you aggregate to the gene level - where from the example above could result in the following counts:

          Cond1   Cond2    Change
Gene:       10       20       10

Such a large difference is associated with a small uncertainty whereby we can say it is significantly differentially expressed.

Hope this helps. Kristoffer

Ps. why are you interested in differential isoform expression? Would differential gene expression and differential isoform usage not be more suitable?

ADD COMMENTlink modified 11 days ago • written 11 days ago by kristoffer.vittingseerup830

well generally I know why transcript has low power. i want to find a way to handle this problem.

the reason why i am interested is related to transcription factor. I what TF regulates are not gene but transcript and transcript also translate into protein. I cannot go into detail here because to be honest I am still working on it. But, I think that understanding of transcript expression and their correlation to protein expression can show more information for building transcription factor network.

ADD REPLYlink written 11 days ago by bharata1803380
1

The only ways of increasing the power is to : 1) Sequence (much) deeper 2) Aggregate transcripts (fx those with same transcription start site, same ORF etc) 3) Potentially one can do somthing Transcript Compatibility Counts (TCCs), which can be calculated by Kallisto as described here.

ADD REPLYlink written 11 days ago by kristoffer.vittingseerup830
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1248 users visited in the last hour