Question

What is the best way to analyze and enrich a list of DE genes from RNAseq?

0

Entering edit mode

4.1 years ago

mario.red8976 ▴ 120

Hi everybody. I' m a PhD that mostly do wet-lab that is trying to learn bioinformatics stuff (just the basics for now). Together with some collaborators, we performed an RNAseq analysis on some samples, but since this group have their own bio-informatician that works on this sort-of-service stuff, they analyzed the results for us, providing an excel with the lists of DE genes. Now, I am interested in finding a biological meaning from the DE genes, but also find a way to enrich the lists, so to have a smaller number of genes of interest, that in the end I will validate through PCR or other methods (so it's better to have 20 genes than 200) and most important I want to try to do this by myself to learn how to do it. I analyzed the lists using tools like metascape, to find pathways that are more interesting for me, so to go on only with that genes. Now I am thinking basically what to do next.

So, my question is: do you have any suggestion on how it is better to proceed or which kind of analyses can be done on such lists? If you can provide me some links of tools or R/bioconductor packages, It would be great!

Thank you a lot for your answers and for your help!!

R pathways enrichment RNA-Seq • 1.7k views

ADD COMMENT • link updated 10 months ago by Ram 43k • written 4.1 years ago by mario.red8976 ▴ 120

0

Entering edit mode

Thank you for both your answers, I will try to use both methods. For sure also I am planning to do PCR for validation, I just want to try to reduce the number of genes that I will check, because I have hundreds of genes that are significantly DE. In your opinion, after getting the enriched pathways, there is something more that could be done or it's just that?

Thank you again!

ADD REPLY • link 4.1 years ago by mario.red8976 ▴ 120

score 2 · Answer 1 · 2020-03-30

I like using gprofiler2 for functional enrichment of genes, e.g. the DGEs split into up- and downregulated ones. For this you only need the gene names, e.g. Ensembl IDs or MGI/HGNC depending on organism.

For this lets say you have the output of e.g. DESeq2 or edgeR, so a list with all genes including the statistics. Filter out those that are significant, e.g. padj < 0.05 and extract the gene names. Say you extracted the upregulated ones as genes.up and all the genes from the results list regardless of significance as genes.background.

In R, this would be an example for mouse genes:

install.packages("gprofiler2")
library(gprofiler2)

genes.up.results <- gost(query = genes.up, 
                         custom_bg = genes.background,
                         organism = "mmusculus",
                         significant = TRUE, 
                         user_threshold = 0.05, 
                         correction_method = c("gSCS"))

genes.up.results$result will give you then a data frame that contains only the pathways that your upregulated genes are significantly enriched for using all genes that went into the DGE as background. the gSCS is the multiple testing correction that the authors of the package recommend. The analysis is pretty fast, should not take more than a few seconds. You can also do this interactivly on https://biit.cs.ut.ee/gprofiler/gost by pasting your genes into the search field. I prefer it in R though to have it reproducible and results saved in my environment right away.

score 0 · Answer 2 · 2020-03-30

In regards to choosing candidates to validate. My suggestion on this end is to accept the following “nothing is consistent”. My suggestion would be to PCR genes with the highest fold-change.

If you have to subset from your original list I would still suggest the same.

Here is an easy site to use for gene ontology: http://cbl-gorilla.cs.technion.ac.il/