Question: Get accumulated gene expression from alternative splicing in DESeq2
gravatar for bharata1803
4.0 years ago by
bharata1803420 wrote:


So, I tried to investigate correlation between a gene and it's gene promoter. I realized that for both the gene and its promoter have several alternative splicing and it is listed as different entry in my read count table (has its own Ensemble gene ID). I generated read count matrix using express and then manually read it with DESeq2's function DESeqDataSetFromMatrix. After that I call rlog function to  the DESeq object and tried to plot the assay.

I'm kinda confused to see the correlation between gene and its promoter because there are many transcript for the gene and many transcript for the promoter (well, promoter is also gene). What I'm thinking is, can I just add each of the gene transcript so that I get the total transcript from all of the splice variant? I'm not sure but I remember read some post about DESeq2 which we can not just add directly from 2 different gene and should do some normalization. Is the normalization already included in rld function? Thank you.

By the way, I noticed something strange after calling the rld function. I found that several example which has 0 read counts actually have some values in the log transform value after the rld function. Is it normal? Thank you

rna-seq deseq2 • 1.5k views
ADD COMMENTlink modified 4.0 years ago by Devon Ryan90k • written 4.0 years ago by bharata1803420
gravatar for Devon Ryan
4.0 years ago by
Devon Ryan90k
Freiburg, Germany
Devon Ryan90k wrote:
  1. If the annotation (GTF or GFF file) for your organism doesn't properly match what you're seeing then you'd be better off running something like stringTie first, just to have a more worthwhile annotation file.
  2. You can certainly sum the metrics across gene fragments to get a total metric for a given gene.
  3. rlog doesn't really do normalization per se, rather it incorporates a sample-specific coefficient. Note that you shouldn't be doing statistics on the results, they're for things like PCA. See section 5.4 of the vignette.
  4. The results of rlog are estimates incorporating a prior distribution, so I suspect that that can cause changes in 0 counts.
ADD COMMENTlink written 4.0 years ago by Devon Ryan90k

Thank you Devon Ryan for your reply.

1. For the GFF/GTF file, I still don't understand what you mean. But I think the GTF file is similar with the genome browser from Ensemble.

2. So, what you mean is, for example, there are 2 transcript variant of gene A. I can sum the read count directly to get the total gene transcript, right?

3 & 4. I see. I will read about rlog carefully to understand that.

ADD REPLYlink modified 4.0 years ago • written 4.0 years ago by bharata1803420
  1. Yes, Ensembl is one source of annotation files.
  2. Yes, though I wonder why you're bothering with transcript-level metrics to begin with if you want to look at gene-level changes.
ADD REPLYlink written 4.0 years ago by Devon Ryan90k

Do you have any suggestion for gene level analysis? I tried to use cufflinks but my gene is kinda weird because same locus has 9 different genes. The result of cufflinks combine all of those genes into 1 gene expression. So, use raw count based analysis is better. I remember you were the one who gave me the suggestion :) 

ADD REPLYlink written 4.0 years ago by bharata1803420

Perhaps stringTie works a bit better than cufflinks for the assembly. If nothing seems to work well for this then I guess the manual approach is the only one you have.

ADD REPLYlink written 4.0 years ago by Devon Ryan90k

Thank you. I think I will try StringTie because I already have Tophat alignment for my reads so it is faster to check that. But with StringTie description :

It uses a novel network flow algorithm as well as an optional de novo assembly step to assemble and quantitate full-length transcripts representing multiple splice variants for each gene locus.

I'm kinda worried that the same problem with Cufflinks will appear because the different gene in the gene locus will be considered as splicing variant rather than diffferent genes. From other forum and reading, Salmon is another possibilities. It seems Salmon is kinda similar with eXpress which generates transcript level expression but can be accummulated to get the gene level.

ADD REPLYlink written 4.0 years ago by bharata1803420
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1404 users visited in the last hour