Hi,
I need some inputs in normalizing the RNA-Seq data with spike-ins and using the DESeq to retrieve differentially expressed genes from the samples. I have a condition where I have 7 samples out of which 4 samples are of peripheries that give tumor and 4 are centers of tumor. I want to normalize the raw fragment counts(which you use in DESeq) with spike-in and then compute the DEGs from it. my samples data set looks like
head(m)
            Sample_118p.0 Sample_132p2.0 Sample_91p.0 Sample_118rz.0 Sample_132rz1.0 Sample_132rz2.0 Sample_91rz.0
XLOC_000001          1534           2603         1764           1057            2889            3830          1684
XLOC_000002           175            304          208            144             428             367           222
XLOC_000003            80            195          109            916            2515            2314          1082
XLOC_000004            49             66           54             51             127             219            94
XLOC_000005             0              0            0              0               0               0             0
XLOC_000006             0              1            0              0               0               0             0
spike-in data set
head(sp)
           Sample_118p.0 Sample_132p2.0 Sample_91p.0 Sample_118rz.0 Sample_132rz1.0 Sample_132rz2.0 Sample_91rz.0
ERCC-00009            49             66           54             51             127             219            94
ERCC-00025             9              7            6              5              14              21             8
ERCC-00031             0              0            0              0               1               1             0
ERCC-00034             1              3            2              0               6               6             4
ERCC-00035             5              7            7              9              32              38            21
ERCC-00042            43             78           56             73             202             199            98
I am using the spike ins sub category B which have equal concentrations so that the consistency is maintained
Now I want to use this in DESeq.
So what is the best possible way to implement this normalization on my RNA-Seq data and create the Newcountdata set object and then estimate size factors and then the dispersion (per-gene variance) to get the Differentially expressed genes from there. Does anybody have any idea about this? It will be good if anyone has used such scenarios can give me some idea about this problem?
I'm assuming that you want to use the spike-ins simply for the size normalization, rather than estimating dispersion, correct? If so, you can actually manually set the size factors.
Thanks , I have been able to normalize my RNA-Seq data with spike-ins and then used it to
estimateDispersionsto calculate the per gene variation and then use the negative binomial test to find the DEGs, but owing to the high complexity in my data set I cannot consider the result of DESeq as they fail the multiple testing correction and just on the basis of uncorrected p-val I don't see using those genes as there is another problem where I can see the read counts for some of my comparison is 0 so the mean is also 0 and hence the fold change despite of being statistically significant , it cant be considered. Have anyone of you faced such situations? I have already tried RankProd and Cuffdiff( not good results downstream). I am now trying DESeq, don't know what to do next.I'm not sure how highly complex your dataset is, it sounds fairly straight-forward. The presence of 0 counts isn't that uncommon, though you're most likely to see those when the counts overall are quite low, so they'll generally have crappy p-values. Without seeing enough of your data or any plots, it's rather difficult to give you any advice on how to proceed. In general, DESeq can deal with the 0-count scenario, but as you mentioned, the fold-change is not always the best metric to go by.
What have you tried so far? Where are you getting stuck in the analysis?