Hello,
I want to find all genes that positively or negatively correlate with a particular gene A in an RNA-seq dataset. Can I use the normalized expression values of gene A as a continuous variable and perform expression profiling using DESeq2. Would gene length and GC content be an issue in such a case?
Thank you,
Your question is not pretty clear. Are you trying to say that you have dosage levels of a gene across differential samples in different conditions in your RNA-Seq? Or you want to simply find from your differential expression output results , genes that have higher correlation with others?
For the former, you need to use in your model matrix the levels of the gene which is having different dosage levels and use it to regress the model and find the differential expression across your conditions of interest. This way you can understand the effect of dosage of that gene on your samples across conditions and how the transcriptome is affected.
If you simply want to understand which are the co-expressed genes for our partcular gene of interest that is also differentially expressed, then it is a slightly different approach. You need to project all your DE genes either normalized value tpm or fpkm in a pca and compute the distances of the genes between your gene of interest and the other genes with KNN methods. Then you can actually know. This gives which are co-expressed.
However you can also take a look at WGCNA.
You have to be a bit clear with thquestion.
Thank you for your answer. It is the first one. I am interested in gene to gene correlation without considering any case vs control model. For example in a clinical dataset from a uniform cohort, what genes are high when gene of interest is high or low when gene of interest is low. My question is can DESeq2 normalization be used for gene to gene correlation similar to case vs control designs.
Why no just calculate Pearson/Spearman correlations between your gene A and all other genes and get those with a satisfactory R^2 coefficient or p-value?
That's exactly what I want to do. But should gene length and GC content be an issue?
I don't think that's a real issue. You just want to know if gene A goes up, which other genes go up as well. If these genes are longer they'll have more reads but that's not relevant, it's about the correlation.
Of course, to correct for gene length you could use TPM values. For GC content there is not much you can do.
A simple correlation could be sufficient, but I think a more robust framework for your analysis would be WGCNA, and then find the cluster of genes which (anti)correlate to gene A.
Thank you very much, I will try WGCNA and compare with DESeq2!
to be honest simple correlation will work, but why do you then want to use DESeq2? what is the purpose of using the DeSeq2 here? I do not see anything. If your simple understanding is to find which genes correlated and anti-correlate with your gene A then just do what @WouterDeCoster said of computing the "Pearson/Spearman correlations between your gene A and all other genes and get those with a satisfactory R^2 coefficient or p-value"
Or conversely, you can use WGCNA to find the clusters of co-expressed genes there the genes that will cluster with your gene A will be the best fit that correlates and anticorrelate with changes of expression of that gene A.
DESeq2 is used for differential expression. Where you will normalize the counts and then use a model matrix that takes your desire design matrix including the factors that need to be used for the differential expression. I do not see the need here unless you have 2 conditions where you have to see the differences. If you are thinking of gene length and all use the TPM or FPKM values which are normalized for gene length. Or you can also use the normalization used in deseq2 before doing any DE analysis and use that normalized counts to do what is said by @WouterDeCoster for correlation. There are normalization functions in DESeq2 which you can use but not the tool for differential expression. Your query needs to be clearly set.
I want to use DESeq2 for normalization across samples. Plus I have other comparisons that included 2 conditions where I used DESeq2 on the same dataset. So if it is possible to use DESeq2 for gene to gene correlation I want to use DESeq2 normalization here as well.