Question: DESeq2 DEG analysis using different sequencing depth data
gravatar for woongjaej
13 months ago by
woongjaej10 wrote:

Hi, guys.

I have a question analysing RNA-seq data(I'm using DESeq2)

I'm willing to use 8 samples with high sequencing depth, and 6 samples with low sequencing depth.(About 4times lower).

Can I use these data to analysis in DESeq2?? Does DESeq2 normalizes these samples' count for using??

If I can't, could somebody point the direction to the method I can use all these samples??

Any help will be very very saving me... Thanks...

Best, Woongjae

ADD COMMENTlink modified 13 months ago by h.mon28k • written 13 months ago by woongjaej10

Things to consider:

  • how low are the low depth samples? Do they reach at least 10 million reads? Are they less than 1-2 million reads? How high are the high depth?

  • are all the samples from the same library preparation / sequencing batch? Different batches? Why some samples with high and other with low depth? Bad RNA quality? Ribosomal RNA contamination?

  • are high and low depth samples randomly distributed, or all high are from one treatment, and all low from another treatment?

ADD REPLYlink written 13 months ago by h.mon28k

Hi h.mon!

  1. Reads of high and low depth samples are about 100million and 10million reads each.
  2. The library preparation of the samples were performed 2times. Like experiment 1, experiment2. Their designs are all same, only used sample is different(ex. tissue of different mouse, but same condition, same library kit, same age, same gender)
  3. Samples of each depth are distributed equally. Like 3 controls and 3 treatments at low depth, and 4 controls and 4 treatments. The experiments were performed twice because we thought we needed more experiment. But at the second experiment, we decided to sequence more

Thanks for your help h.mon!!

ADD REPLYlink written 13 months ago by woongjaej10

I use edgeR rather than DESeq2, but I know they are pretty similar. edgeR will normalize for sequencing depth (using TMM method by default). I'm sure DESeq2 also uses a similar step during a standard workflow. Does DESeq2 use raw read counts, or something like RPKM? edgeR uses raw counts, which is why it performs TMM normalization. If you use RPKM, then it is already normalized for sequencing depth, in addition to gene length.

ADD REPLYlink modified 13 months ago • written 13 months ago by goodez460

Thanks goodez!! DESeq2 uses raw count number at the begining of the process. Maybe DESeq2 also uses similar method to normalize the raw counts.But I'm not sure of it and my data of two sequencing depth groups seems to have different FPKM and very different normalized count number. So I'm looking forward to get cofirmed by some DESeq2 experts. I'll try edgeR,too. Thank you very much for the reply!!

ADD REPLYlink written 13 months ago by woongjaej10

I would try a test run where you downsample the big ones down to the coverage of the low ones. See if that looks drastically different than using all the data together.

ADD REPLYlink written 13 months ago by swbarnes27.0k

Hi swbarnes2

I perfomed differential expression analysis in two groups.(1. low depth samples 3 vs 3 deg analysis 2. high depth samples 4vs4)

It seems there is no drastical difference, but the fdr of some genes have improved. For example, if I use just high depth samples to analysis deg, some genes' fdr values are over 0.05. But when I use all the samples together, fdr gets below 0.05.Fold changes do not seem to change that much.

ADD REPLYlink written 13 months ago by woongjaej10

I'm not sure why you responded to me to say you did not do what I suggested...

ADD REPLYlink written 13 months ago by swbarnes27.0k

Sorry, I have misunderstood your comment. I'll try to ru your suggestion. Thanks

ADD REPLYlink written 13 months ago by woongjaej10
gravatar for h.mon
13 months ago by
h.mon28k wrote:

The DESeq2 count normalization thread at the BioConductor forum has a lot of useful information for you.

As your low and high depth samples (and library prep batch) are balanced between the treatments, I think you can use them together and let DESeq2 size factor normalization take care of the issue. However, you have a batch effect in your experiment. Examine the PCA plot, depending on how your samples group, you may want to introduce batch in the design formula to take it into account when testing for treatment effects.

ADD COMMENTlink written 13 months ago by h.mon28k

Thank you h.mon for your kind replies!


ADD REPLYlink written 13 months ago by woongjaej10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1680 users visited in the last hour