Question: RNA Seq differential gene expression analysis
0
gravatar for Onat
4.6 years ago by
Onat0
Germany
Onat0 wrote:

Hi,

I am very fresh in the RNA Seq data analysis area and I have a question regarding the differential gene expression analysis. I have come up with an idea to perform differential gene expression analysis by using RPKM and/or expression values from RNA Seq Data by considering all the RPKM and/or expression value outputs for each gene in separate datasets and consider each value as one replicate. I have one sample from a drug-resistant cell line and one sample from a drug-sensitive cell line. I have usually more than one RPKM values for one gene at a sample. I was thinking to consider the RPKM value for each gene as a replicate and continue with the statistical analysis and fold change calculation. Could that be possible or logical? I appreciate your help. Thank you in advance.

ADD COMMENTlink modified 4.6 years ago • written 4.6 years ago by Onat0

Thanks a lot for your reply. In the dataset I have RPKM and expression values. Do you have any suggestion to calculate the fold change for each gene by skipping the statistical test (since there are no real replicates)?

ADD REPLYlink written 4.6 years ago by Onat0

You might just sum the various isoform metrics and then get the fold-change from those. Do include average expression too, since then you can filter out the huge fold-changes from lowly expressed genes.

ADD REPLYlink written 4.6 years ago by Devon Ryan88k

Can you be more specific please? How can I interpret the RPKM values to calculate the fold changes?

ADD REPLYlink written 4.6 years ago by Onat0

The fold-change is just their ratio. So if you have an RPKM of 5 in one sample and 2 in another, then the fold change is 5/2 =2.5 (or 2/5=0.4, depending on which sample you'd want things relative to). If wanted log2 foldchanges, then just log2 transform and subtract.

ADD REPLYlink written 4.6 years ago by Devon Ryan88k

How can I decide on the RPKM value for each gene to calculate the fold change? Can I just use the highest RPKM values for each gene in a sample?
 

ADD REPLYlink written 4.6 years ago by Onat0

Either just sum the isoforms or take the median.

ADD REPLYlink written 4.6 years ago by Devon Ryan88k

Ok thanks a lot.

ADD REPLYlink written 4.6 years ago by Onat0

Actually I also know the ratio of each transcript variant of a gene. So this means I have the contribution rate of each transcript variant of a gene to the overall expression of that gene. Can I normalize the RPKM value for each transcript variant depending on its ratio and then get the overall RPKM for one gene?

For example variant 1 of gene A has 60% with RPKM value of 5. Variant 2 of gene A has 40% with RPKM value of 2. The overall RPKM value for gene A would be; (5x0.6)+(2x0.4)=3.8.

ADD REPLYlink written 4.6 years ago by Onat0

Something like that should work.
 

ADD REPLYlink written 4.6 years ago by Devon Ryan88k

By this way I think the resulting RPKM value would be more reliable for each gene. Thanks for your help again.

ADD REPLYlink written 4.6 years ago by Onat0
2
gravatar for Devon Ryan
4.6 years ago by
Devon Ryan88k
Freiburg, Germany
Devon Ryan88k wrote:

In the best case scenario you might get lucky and the results will correspond to changes in isoform composition between the samples, but I really wouldn't recommend bothering with such an analysis. While some genes will have a very large number of meaningfully expressed isoforms, most will only have one or a couple. So, you'll already end up not testing most genes. The ones you do will have the results dominated by noise and the fact that the RPKM values for each isoform are dependent on each other (i.e., an increase in isoform A will probably correspond to a decrease in B, which violates one of the more important premises of the statistical test I'm guessing you'll end up using). Further, even if you do find a difference, it's impossible to say if this is due to the difference in treatment or not, because you obviously have no replicates to look at. In short, I would recommend not wasting much time with a dataset like this, the experiment simply wasn't designed to give much in the way of useful output.

The best you can likely do is rank things by fold change or use a package like GFold.

ADD COMMENTlink written 4.6 years ago by Devon Ryan88k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1612 users visited in the last hour