Hi guys,I have a gene list that I have got after DEG analysis for 3 different GSE datasets.Not I want to perform a weighted gene co-expression analysis on this DEG list.Which format or method should i make my file so that i can use the list in R.Should I just put the gene list in a column ? or should i annotate each gene..
Dear @ ghataksoumyakanti
The initial entry is a normalized gene expression matrix, with the rows representing the genes and columns representing the sampleS.
I think you should spend some time reading WGCNA TUTORIAL if you haven't done it yet.
If you are experiencing difficulties with the WGCNA, The CEMiTool can be an easier way to do the same analysis.
How do I make the normalised matrix..is it the same format which we obtained after deg analysis using r as a final result or do we have to make the matrix manually.
I don't know your data, I don't know what normalization you do ...
But usually after:
Background correcting
Normalizing
log2-transformation
You build the design matrix for the linear modelling function
f <- factor(targets$Target, levels = unique(targets$Target))
design <- model.matrix(~0 + f)
colnames(design) <- levels(f)
and apply the intensity values to lmFit
fit <- lmFit(data.norm, design)
And then write them to disk:
write.table(fit, file="data_norm.txt", sep="\t", quote=FALSE)
the data_norm.txt
is what you will use in WGCNA or webCEMiTool.
This can help you with the normalization question:
Yes...i used the write.table command for getting my deg analysis result in r.so the same format in which i got my result deg i can use for input in wcgna analysis right...and do i need any furthur conversion from gene id to probe od or vice versa
The input to WGCNA should be a matrix of numerical values, with samples as columns, and variables (usually genes) as rows.
WGCNA does not care about your variable names. They can be probe IDs, HGNC symbols, or, simply, numbers going from 1:n. However, you should obviously have the variable names in a format that is understood by you.
Please go through the WGCNA tutorial, first, so that you can understand how to use it.
Sometimes people just don't want to spend their time reading the tutorials.
Ohk...I will try the same and get back in case there is a problem...I actually went through the tutorial but cud not make out the input part much..
The WGCNA tutorial is not great, and the authors should improve it - I admit that. At a certain point in it, you have to download sample files that you then input to R. Did you get to that stage?
By sample files do you mean the original female mice liver data that the original papre was based upon??No sadly I did not get to that stage.I am still struggling as to how to enter my deg analysis data.
This is what R says when I use the goodsamplesgenes function gsg = goodSamplesGenes(datExpr0, verbose = 3); Flagging genes and samples with too many missing values... ..step 1 Error in goodGenes(datExpr, weights, goodSamples, goodGenes, minFraction = minFraction, : datExpr must contain numeric data.
My first column contains logFC values for all the degs
@ghataksoumyakanti As far as I know you shouldn't use the list of DEGs. As Kevin and I already said you should use the matrix of gene expression "properly normalized"
The WGCNA method receives an input “m x n” gene expression matrix, containing n samples under specific conditions and m genes, where each element in the matrix gives the expression of one gene in a particular sample. The correlation between each pair of genes is then transformed into an m x m adjacency matrix through an adjacency function (reference).
Yes, the liver data. Okay, let us go back to the beginning: can you list the key objects in your workspace, and show a sample of these (e.g., using the head()
function
column names are [1] "logFC" "AveExpr" "t" [4] "P.Value" "adj.P.Val" "B" [7] "gene.symbols" "X"
These are the column names of the input file i was tryinv to use.
Dear @ghataksoumyakanti,
I think you did not quite understand what I said:
The initial entry is a normalized gene expression matrix, with the rows representing the genes and columns representing the sampleS.
You don't use your list of DEGs but the normalized gene expression matrix.
For conversion gene id to probe id, see these answers:
Question: Annotate Affymetrix probesets to Gene symbols
Question: Affymetrix Human Genome U133 Plus 2.0 Array - probe annotation with biomaRt
Question: Where To Find Annotation File For Agilent Microarray?
As Leite mentioned, please follow the WGCNA tutorial, so that you can understand what should be the input format of your data.