DESeq2 DEG analysis using different sequencing depth data
1
0
Entering edit mode
3.8 years ago
woongjaej ▴ 20

Hi, guys.

I have a question analysing RNA-seq data(I'm using DESeq2)

I'm willing to use 8 samples with high sequencing depth, and 6 samples with low sequencing depth.(About 4times lower).

Can I use these data to analysis in DESeq2?? Does DESeq2 normalizes these samples' count for using??

If I can't, could somebody point the direction to the method I can use all these samples??

Any help will be very very saving me... Thanks...

Best, Woongjae

deseq2 RNA-Seq sequencing depth DEG • 3.0k views
1
Entering edit mode

Things to consider:

• how low are the low depth samples? Do they reach at least 10 million reads? Are they less than 1-2 million reads? How high are the high depth?

• are all the samples from the same library preparation / sequencing batch? Different batches? Why some samples with high and other with low depth? Bad RNA quality? Ribosomal RNA contamination?

• are high and low depth samples randomly distributed, or all high are from one treatment, and all low from another treatment?

0
Entering edit mode

Hi h.mon!

2. The library preparation of the samples were performed 2times. Like experiment 1, experiment2. Their designs are all same, only used sample is different(ex. tissue of different mouse, but same condition, same library kit, same age, same gender)
3. Samples of each depth are distributed equally. Like 3 controls and 3 treatments at low depth, and 4 controls and 4 treatments. The experiments were performed twice because we thought we needed more experiment. But at the second experiment, we decided to sequence more

0
Entering edit mode

I use edgeR rather than DESeq2, but I know they are pretty similar. edgeR will normalize for sequencing depth (using TMM method by default). I'm sure DESeq2 also uses a similar step during a standard workflow. Does DESeq2 use raw read counts, or something like RPKM? edgeR uses raw counts, which is why it performs TMM normalization. If you use RPKM, then it is already normalized for sequencing depth, in addition to gene length.

0
Entering edit mode

Thanks goodez!! DESeq2 uses raw count number at the begining of the process. Maybe DESeq2 also uses similar method to normalize the raw counts.But I'm not sure of it and my data of two sequencing depth groups seems to have different FPKM and very different normalized count number. So I'm looking forward to get cofirmed by some DESeq2 experts. I'll try edgeR,too. Thank you very much for the reply!!

0
Entering edit mode

I would try a test run where you downsample the big ones down to the coverage of the low ones. See if that looks drastically different than using all the data together.

0
Entering edit mode

Hi swbarnes2

I perfomed differential expression analysis in two groups.(1. low depth samples 3 vs 3 deg analysis 2. high depth samples 4vs4)

It seems there is no drastical difference, but the fdr of some genes have improved. For example, if I use just high depth samples to analysis deg, some genes' fdr values are over 0.05. But when I use all the samples together, fdr gets below 0.05.Fold changes do not seem to change that much.

0
Entering edit mode

I'm not sure why you responded to me to say you did not do what I suggested...

0
Entering edit mode

Sorry, I have misunderstood your comment. I'll try to ru your suggestion. Thanks

1
Entering edit mode
3.8 years ago
h.mon 34k

The DESeq2 count normalization thread at the BioConductor forum has a lot of useful information for you.

As your low and high depth samples (and library prep batch) are balanced between the treatments, I think you can use them together and let DESeq2 size factor normalization take care of the issue. However, you have a batch effect in your experiment. Examine the PCA plot, depending on how your samples group, you may want to introduce batch in the design formula to take it into account when testing for treatment effects.

0
Entering edit mode

Thank you h.mon for your kind replies!

Woongjae