Question: RNA-Seq analysis = only 10 DE genes
0
gravatar for ste.lu
4 months ago by
ste.lu40
ste.lu40 wrote:

Hi All,

I've RNA-seq data for 2 cell lines (let's say A and B) which have been knocked out for a gene: -A wt -A KO -B wt -B KO. I've used salmon to map the reads on the reference genome and DESeq2 to perform the differential expression analysis. In the end I've only 10 genes DE in between WT and KO, do you think something is wrong or it is a feasible result?

rna-seq • 323 views
ADD COMMENTlink modified 4 months ago by Istvan Albert ♦♦ 79k • written 4 months ago by ste.lu40
2

That could be a good result. Since you are generating hypotheses for further testing it may be very manageable to make a story out of the 10 genes you have identified. But without knowing the complete story this is about all we can say.

ADD REPLYlink written 4 months ago by genomax63k

Well, then I'll cross the fingers!

ADD REPLYlink written 4 months ago by ste.lu40
1

How many replicates do you have per condition and cell line? It could be that given the gene has a limited impact on the regulation of other genes or cellular responses and you lack power to detect modest changes.

ADD REPLYlink written 4 months ago by ATpoint14k

I have 2 technical replicates for each of the biological replicates (A and B in the question). However, something for one sample went wrong during the library preparation and it made it useless, so I ended up analyzing only 1 biological replicate against the other. Would you suggest to insert in the analysis 2 technical replicate for one cell line against one technical replicate for the other cell line?

ADD REPLYlink written 4 months ago by ste.lu40
1

Which filters for L2FC, padj, basemean, etc., do you use to define a DE gene?

ADD REPLYlink written 4 months ago by grant.hovhannisyan1.4k

I am considering DE genes the ones with a padj below 0.05

ADD REPLYlink written 4 months ago by ste.lu40

This is probably the most liberal filter one can think of, and still you get just 10 DE genes, meaning that the samples are extremely similar to each other. Have you checked Spearman's correlation? I bet it would be close to 1. And regarding the experimental design you ended up having, please see these posts http://seqanswers.com/forums/showthread.php?t=31036 https://support.bioconductor.org/p/101210/

ADD REPLYlink written 4 months ago by grant.hovhannisyan1.4k
1
gravatar for Istvan Albert
4 months ago by
Istvan Albert ♦♦ 79k
University Park, USA
Istvan Albert ♦♦ 79k wrote:

Always visualize your results with other means, in this case align against the genome. Once you look at your data with IGV the answers will be more forthcoming.

Are you getting so few results because the data is perfect, everything comes out the same across all samples? (this is usually rare)

Then perhaps your replicate variability is such that the intra-replicate consistency is comparable to the variation across conditions, in which case you are finding these few results because the evidence for variation across conditions is just not there.

Finally, I will say that I only had such a problem a few times, when studying brain samples from an excellent scientist whose data always turns out to be nearly perfect, text-book like consistency across replicates.

ADD COMMENTlink written 4 months ago by Istvan Albert ♦♦ 79k

Always visualize your results with other means, in this case align against the genome. Once you look at your data with IGV the answers will be more forthcoming.

Is there a way to go from Salmon results to something similar to TopHat or I have to go back and redo the alignment?

Are you getting so few results because the data is perfect, everything comes out the same across all samples?

How can I control for this? with something more than a correlation?

ADD REPLYlink written 4 months ago by ste.lu40

The best is to align separately against the genome, you can use hisat2 or even bwa mem for that.

If your samples are similar across samples it means there is no expression change. Correlation may not be informative in some cases where lengthy well-correlated regions will mask shorter, uncorrelated regions. In addition, correlation accounts for changes in the same direction, and may not be able to account for changes that take place in the same direction only with different magnitudes.

Correlation is good for noisy data, for data that replicates too well across conditions it becomes a lot less useful.

ADD REPLYlink written 4 months ago by Istvan Albert ♦♦ 79k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1622 users visited in the last hour