Question: How filter genes to construct co-expression network?
0
gravatar for niutster
21 months ago by
niutster80
niutster80 wrote:

Hi, I am interested to filter data for constructing co-expression network , Which parameter can i use to filter genes? As i know in WGCNA tutorial, it suggests not to use differential expressed genes(DEG) to filter genes.

ADD COMMENTlink modified 21 months ago by Kevin Blighe49k • written 21 months ago by niutster80
0
gravatar for Kevin Blighe
21 months ago by
Kevin Blighe49k
Kevin Blighe49k wrote:

The data should just be any normalised dataset that has undergone the standard QC filtering and data processing for things like background noise (microarray), low count transcripts, etc. As WGCNA is fundamentally based on correlation, the data does not necessarily have to be logged or on the Z-scale. Just any normalised data is fine, and obviously it makes sense that all samples are processed in the same way.

WGCNA states not to use differentially expressed genes because it was designed as an unsupervised clustering procedure.

For other network methods, you'd have to check what respective data inputs are required.

Kevin

ADD COMMENTlink written 21 months ago by Kevin Blighe49k
1

Hi Kevin, what do you think about filtering genes with a low variance of expression, e.g. taking the top 50% most variable genes?

ADD REPLYlink written 21 months ago by WouterDeCoster41k

That's also a great idea of which I had not thought

ADD REPLYlink modified 7 weeks ago • written 21 months ago by Kevin Blighe49k

Thanks, Could you explain more about filtering based on low variance ? How can do it?

ADD REPLYlink written 21 months ago by niutster80
3

in R, assuming your matrix of gene expression is called data:

data$variance = apply(data, 1, var)
data2 = data[data$variance >= quantile(data$variance, c(.50)), ] #50% most variable genes
data2$variance <- NULL

Essentially this code creates a "variance" column, selects those which are in the top 50%, and removes that column. I don't know if you are using a (genes * samples) or (samples * genes) matrix, so you may have to change the 1 in the first line to use the apply() function in the other dimension.

ADD REPLYlink modified 21 months ago • written 21 months ago by WouterDeCoster41k
1

OP can also use varFilter function in genefilter package in R.

ADD REPLYlink modified 21 months ago • written 21 months ago by cpad011212k
1

you can use the following code to filter 50% of genes:

Library(genefilter)
    genes<-varFilter(exp)

or this code for example to keep only 20%of genes:

genes<-varFilter(exp, var.func=IQR, var.cutoff=0.8, filterByQuantile=TRUE)
ADD REPLYlink modified 21 months ago • written 21 months ago by mannoulag160

Dear WouterDeCooster

Thanks for your comment. I found another filtering strategy in an article that authors had selected genes if presented at least in 50% of samples.I mean that I have to keep genes that present at least in 50% of samples.

could you please share your comment about that strategy and help me for writing R code about that filtering?

Best Regards,

ADD REPLYlink modified 10 months ago • written 10 months ago by modarzi80

Data normalization and pre- processing was performed , I just want to filter data to reduce the volume of data.

ADD REPLYlink written 21 months ago by niutster80

Okay. It can handle large datasets via the blockwiseModules function.

Alternatively, one thing you could do is filter your genes based on a specific pathway (like 'DNA repair' genes, 'Wnt signalling', etc). Obviously then it's no longer entirely unbiased.

ADD REPLYlink written 21 months ago by Kevin Blighe49k

Is it good to use DESeq2 normalized count for WGCNA. The counts obtained by counts(dds, normalized=TRUE)

ADD REPLYlink written 14 months ago by Arindam Ghosh170
2

As mentioned, and according to the WGCNA authors, un-logged or logged data is fine - the most important is that it's processed in the same way. However, I don't know how they did their validations because results will differ between logged and un-logged normalised counts.

Why not try the counts from counts(dds, normalized=TRUE) and also those from the regularised log function of DESEq2?

ADD REPLYlink modified 14 months ago • written 14 months ago by Kevin Blighe49k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 788 users visited in the last hour