I have a question analysing RNA-seq data(I'm using DESeq2)
I'm willing to use 8 samples with high sequencing depth, and 6 samples with low sequencing depth.(About 4times lower).
Can I use these data to analysis in DESeq2?? Does DESeq2 normalizes these samples' count for using??
If I can't, could somebody point the direction to the method I can use all these samples??
Any help will be very very saving me... Thanks...
Things to consider:
how low are the low depth samples? Do they reach at least 10 million reads? Are they less than 1-2 million reads? How high are the high depth?
are all the samples from the same library preparation / sequencing batch? Different batches? Why some samples with high and other with low depth? Bad RNA quality? Ribosomal RNA contamination?
are high and low depth samples randomly distributed, or all high are from one treatment, and all low from another treatment?
Thanks for your help h.mon!!
DESeq2, but I know they are pretty similar.
edgeRwill normalize for sequencing depth (using TMM method by default). I'm sure
DESeq2also uses a similar step during a standard workflow. Does
DESeq2use raw read counts, or something like RPKM?
edgeRuses raw counts, which is why it performs TMM normalization. If you use RPKM, then it is already normalized for sequencing depth, in addition to gene length.
Thanks goodez!! DESeq2 uses raw count number at the begining of the process. Maybe DESeq2 also uses similar method to normalize the raw counts.But I'm not sure of it and my data of two sequencing depth groups seems to have different FPKM and very different normalized count number. So I'm looking forward to get cofirmed by some DESeq2 experts. I'll try edgeR,too. Thank you very much for the reply!!
I would try a test run where you downsample the big ones down to the coverage of the low ones. See if that looks drastically different than using all the data together.
I perfomed differential expression analysis in two groups.(1. low depth samples 3 vs 3 deg analysis 2. high depth samples 4vs4)
It seems there is no drastical difference, but the fdr of some genes have improved. For example, if I use just high depth samples to analysis deg, some genes' fdr values are over 0.05. But when I use all the samples together, fdr gets below 0.05.Fold changes do not seem to change that much.
I'm not sure why you responded to me to say you did not do what I suggested...
Sorry, I have misunderstood your comment. I'll try to ru your suggestion. Thanks