Question: Accepted Workflows for Strongly Ontology/Pathway-Driven Analyses of RNA-seq Data
gravatar for JMallory
8 months ago by
JMallory0 wrote:

Are there generally accepted workflows for strongly ontology/pathway-driven analyses of RNA-seq data?

Specifically, I have a dataset of ~159,000 unique transcripts mapping to ENST identifiers that I've performed differential expression analysis on (healthy tissue vs. diseased). The PI I am currently working with is only interested in a specific class of glycoproteins and known pathways related to his disease process. I had a member of his group generate a list of relevant GO terms for both the glycoproteins and known disease pathways. I parsed this to the level of a unique gene list derived from all the GO terms (via biomaRt) and then used this list to filter my DE transcripts.

Now I have a list of transcripts related to the PI's molecules/disease of interest sorted by p-value as give by DE analysis (using edgeR). Simply stated, I have no idea what to do with this.

Intuition tells me I should attempt to integrate log2 fold change data somehow. The PI has suggested to just dump the top 1,000 ontology-filtered DE genes (by p-value) into the Cytoscape ReactomeFI plugin, run gene set analysis, and call it a day. At best, this seems uninformative and, at worst, a tautology since we've already highly preselected the genes to be used as input.

Has anyone else encountered a similar situation? Are there better ways of analyzing RNA-seq data when there are strong prior assumptions about what genes/transcripts/pathways will be considered?

rna-seq • 337 views
ADD COMMENTlink modified 8 months ago by Lluís R.640 • written 8 months ago by JMallory0
gravatar for Lluís R.
8 months ago by
Lluís R.640
Spain, Barcelona
Lluís R.640 wrote:

Gene Set Enrichment methods are designed precisely for that purpose. Having your list of genes (transcripts ) of interest you can apply these methods to the whole list of genes to observe if there is any difference of this group of genes, for instance, it is more expressed in the disease rather than in controls.

The most common methods of GSEA are implemented in Bioconductor in the following packages: fgsea (a method similar to the one on the Broad Institute), limma, gsva. For testing which GO terms are more enriched you could use GOseq or topGO or GOstats. Here I am assuming you already know what do these transcripts do.

There are other type of analysis besides differential expression analysis, but using them would require to know what question are you trying to answer.

ADD COMMENTlink written 8 months ago by Lluís R.640
gravatar for sysbiocoder
8 months ago by
sysbiocoder170 wrote:

Use the differentially expressed genes to determine the biological significant pathways with GeneSCF

Check if the pathway of interest is enriched, you cannot just use only selected genes for enrichment analysis

ADD COMMENTlink written 8 months ago by sysbiocoder170
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 986 users visited in the last hour