Question

WGCNA data input

0

Entering edit mode

7.0 years ago

adriana.gallego.02 ▴ 10

Someone could explain me why in WGCNA if I have 17.000 genes with p-value <0.05, people just choose a topk of genes? let says, only the top 5000 genes? how can I filter the information? could I use the whole list of 17 thousand genes as input to WGCNA?

Thanks

Adriana

RNA-Seq • 2.9k views

ADD COMMENT • link 7.0 years ago by adriana.gallego.02 ▴ 10

0

Entering edit mode

Thanks Dr. Warner for your answer, however, I have an extra question, why choose 8000 insted 2000 genes for example? How can I argument my selection of top genes?

Here Im attaching the piece of code

#====================================================================================================================================
# select samples
sample.id <- c(6:9,14:17,22:25,30:33) 

# select genes
expr0 <- dat2log[,sample.id]           
temp.anova <- function(x,fa){            
  fit <- lm(x~factor(rep(1:4,each=4)))  
  return(anova(fit)$`Pr(>F)`[1])        
}
pvalues <- apply(expr0,1,temp.anova)    
cutoff <- 0.05/length(pvalues)       
length(pvalues)


#=======================================================================================================================================

# select top k genes for WGCNA
topk = 8000                     
gene.id <- which(rank(pvalues)<8000)
lengthgene.id)

##
expr <- t(dat2log[gene.id, sample.id])

Thanks Adriana

ADD REPLY • link updated 7.0 years ago by WouterDeCoster 47k • written 7.0 years ago by adriana.gallego.02 ▴ 10

0

Entering edit mode

Please use ADD COMMENT or ADD REPLY to answer to previous reactions, as such this thread remains logically structured and easy to follow. I have now moved your post but as you can see it's not optimal. Adding an answer should only be used for providing a solution to the question asked.

I added code markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

101010 Button

ADD REPLY • link 7.0 years ago by WouterDeCoster 47k

0

Entering edit mode

If I'm understanding this correctly this would filter based on the results from your ANOVA. This isn't really recommended since it will give you modules that basically correlate to your factors rather than on the co-expression network. Please refer to point 2:

https://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/Rpackages/WGCNA/faq.html

To answer your question though, there is no rule for choosing 8000 genes over 2000 genes. This is a bit arbitrary.

ADD REPLY • link 7.0 years ago by Jake Warner ▴ 830

score 1 · Answer 1 · 2017-04-29

why in WGCNA if I have 17.000 genes with p-value <0.05, people just choose a topk of genes?

This is sometimes done to remove lowly expressed genes, genes with low variance or any other potential indicator of noise. These genes won't have a strong impact on the network.

how can I filter the information?

This can be done by filtering for mean expression, variance or ranked connectivity.

could I use the whole list of 17 thousand genes as input to WGCNA?

Yes. The genes which would be filtered above most likely won't be assigned to a module when you use the full dataset or they will have low membership in multiple modules.