Question: Differentially Expressed Genes Analysis with RNA-Seq data
gravatar for Xiaokang ZH
3.6 years ago by
Xiaokang ZH50
Xiaokang ZH50 wrote:

2 questions:

  1. When use RNA-Seq data to do Differentially Expressed Genes (DEG) analysis, should the sample (/replicate) numbers of two groups must be the same? For example, if I have 8 samples from control, and 5 samples from treatment group, is it OK to use DESeq to do DEG analysis?

  2. I'm using HISAT2 and featureCouts, after that, got the counts files, before putting them into DESeq, should I do normalisation firstly or can I use them directly?

deg rna-seq next-gen • 1.8k views
ADD COMMENTlink modified 3.6 years ago by EagleEye6.7k • written 3.6 years ago by Xiaokang ZH50
gravatar for ivivek_ngs
3.6 years ago by
Seattle,WA, USA
ivivek_ngs5.0k wrote:

I am expecting you are using DESeq2 and not DESeq.

  • As far as numbers in each group is concerned, it is pretty fine to perform DE analysis. The ideal scenario you get equal samples that are paired and you need to use that feature while performing DE analysis with any standard tool like DESeq2, edgeR or Limma.
  • DESeq2 can still perform DE analysis with just 2 samples in one group and 3 in the other. That's the lowest limit, going lower than that the results are usually not trusted worthy.

  • I have read edgeR can do even with lesser samples in the group but I do not trust such analysis tbh. Your number of samples per group is pretty good to perform the tests.

  • About the normalization. The DE tools I mentioned and also you put in query work on count data. So there is no point of putting normalized data in them. They will perform normalization in the subsequent steps. Just prepare your count table well and follow the DESeq2
    tutorial and you are good to go.
  • I will advice to follow the tutorial pretty well before performing any DE analysis. It is always good to understand how the data behaves, not only a QC ploy but also a good practice in exploratory data analysis. Gives an understanding why you need to perform DE analysis and which samples should be included in it. There might be a scenario where you might have to move 1 or 2 samples from either of the group if they behave as outliers, owing to either batch or sequencing errors, even if you take care of them using batch correction methods. So a complete workflow is advised and also enables you to make a pipleline for discovery which you might be using more often in your lab setting. I hope this was informative for your query.
ADD COMMENTlink written 3.6 years ago by ivivek_ngs5.0k

Sufficiently complete to be an answer, moved.

ADD REPLYlink written 3.6 years ago by WouterDeCoster44k
gravatar for EagleEye
3.6 years ago by
EagleEye6.7k wrote:
  1. It is always good to maintain same number of samples in both the comparison group. In some cases it is bit complicated to get equal number in the comparison groups. My opinion is, it is completely fine to do differential expression analysis using unequal number of comparison groups.

  2. HISAT2 -> featureCounts, choice of tools seems to be good if you are doing gene level differential expression (DE) analysis. And you do not have perform normalization as most of the DE tools will not work with normalized values.

ADD COMMENTlink written 3.6 years ago by EagleEye6.7k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1634 users visited in the last hour