Question: What is the best way to analyze and enrich a list of DE genes from RNAseq?
gravatar for miky.zo
10 weeks ago by
Italy / Busto Arsizio / University of Insubria
miky.zo20 wrote:

Hi everybody. I' m a PhD that mostly do wet-lab that is trying to learn bioinformatics stuff (just the basics for now). Together with some collaborators, we performed an RNAseq analysis on some samples, but since this group have their own bio-informatician that works on this sort-of-service stuff, they analyzed the results for us, providing an excel with the lists of DE genes. Now, I am interested in finding a biological meaning from the DE genes, but also find a way to enrich the lists, so to have a smaller number of genes of interest, that in the end I will validate through PCR or other methods (so it's better to have 20 genes than 200) and most important I want to try to do this by myself to learn how to do it. I analyzed the lists using tools like metascape, to find pathways that are more interesting for me, so to go on only with that genes. Now I am thinking basically what to do next.

So, my question is: do you have any suggestion on how it is better to proceed or which kind of analyses can be done on such lists? If you can provide me some links of tools or R/bioconductor packages, It would be great!

Thank you a lot for your answers and for your help!!

ADD COMMENTlink modified 9 weeks ago • written 10 weeks ago by miky.zo20

Thank you for both your answers, I will try to use both methods. For sure also I am planning to do PCR for validation, I just want to try to reduce the number of genes that I will check, because I have hundreds of genes that are significantly DE. In your opinion, after getting the enriched pathways, there is something more that could be done or it's just that?

Thank you again!

ADD REPLYlink written 9 weeks ago by miky.zo20
gravatar for ATpoint
9 weeks ago by
ATpoint35k wrote:

I like using gprofiler2 for functional enrichment of genes, e.g. the DGEs split into up- and downregulated ones. For this you only need the gene names, e.g. Ensembl IDs or MGI/HGNC depending on organism.

For this lets say you have the output of e.g. DESeq2 or edgeR, so a list with all genes including the statistics. Filter out those that are significant, e.g. padj < 0.05 and extract the gene names. Say you extracted the upregulated ones as genes.up and all the genes from the results list regardless of significance as genes.background.

In R, this would be an example for mouse genes:


genes.up.results <- gost(query = genes.up, 
                         custom_bg = genes.background,
                         organism = "mmusculus",
                         significant = TRUE, 
                         user_threshold = 0.05, 
                         correction_method = c("gSCS"))

genes.up.results$result will give you then a data frame that contains only the pathways that your upregulated genes are significantly enriched for using all genes that went into the DGE as background. the gSCS is the multiple testing correction that the authors of the package recommend. The analysis is pretty fast, should not take more than a few seconds. You can also do this interactivly on by pasting your genes into the search field. I prefer it in R though to have it reproducible and results saved in my environment right away.

ADD COMMENTlink modified 9 weeks ago • written 9 weeks ago by ATpoint35k
gravatar for benformatics
9 weeks ago by
ETH Zurich
benformatics1.6k wrote:

In regards to choosing candidates to validate. My suggestion on this end is to accept the following “nothing is consistent”. My suggestion would be to PCR genes with the highest fold-change.

If you have to subset from your original list I would still suggest the same.

Here is an easy site to use for gene ontology:

ADD COMMENTlink written 9 weeks ago by benformatics1.6k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 910 users visited in the last hour