Question

I have gene lists with fold changes/FDR from RNA-seq, what's the possible downstream analysis I can perform?

0

Entering edit mode

4.5 years ago

htchd7ji • 0

I have gene lists with fold changes/FDR from RNA-seq, what's the possible downstream analysis I can perform? What are the common packages to do the analysis?

RNA-Seq analysis gene list • 1.3k views

ADD COMMENT • link updated 4.5 years ago by ATpoint 82k • written 4.5 years ago by htchd7ji • 0

1

Entering edit mode

Please see this from our perspective - you have provided minimal information but are expecting a 'catch all' solution. So, I'll put these questions right back at you: what is your experiment about?; what are your hypotheses that you want to test?

ADD REPLY • link 4.5 years ago by Kevin Blighe 87k

0

Entering edit mode

I have 6 cell lines and each of them got treatment of control or test compound, all of whose RNA got sequenced. Now I have the data and want to know what conclusion and figures can be drawn from this set of data. I assume there's a difference between control and treatment. I also assume there's a difference among those cell lines.

ADD REPLY • link 4.5 years ago by htchd7ji • 0

0

Entering edit mode

Ok, but why did you perform the experiment in the first place. You can do a lot of different analysis, but you will get much better advice if you have a research question. There must be a reason this experiment has been done, what is it?

ADD REPLY • link 4.5 years ago by ATpoint 82k

0

Entering edit mode

I did the experiment because I want to find out what the test compound does to the cells. This is my research question.

ADD REPLY • link 4.5 years ago by htchd7ji • 0

2

Entering edit mode

What were your hypotheses? How does the compound act? Do you have any prior information that provides context to your question to make it more directed?

Doing things just to "see what shakes out" can be a frustrating way to perform science.

Regardless, it sounds like GSEA or pathway/ontology enrichment analyses are probably what you're looking for. There are a whole host of packages/web servers to do those - they all function more or less the same way and should yield similar results. See clusterProfiler, g:Profiler, enrichR, DAVID, and GSEA for some options.

ADD REPLY • link 4.5 years ago by jared.andrews07 ★ 16k

0

Entering edit mode

I agree with you that it's frustrating when there's no much insight beforehand. But I also think this is a more unbiased way to look at research questions. Many of today's publications actually use RNA-seq as a confirmation tool instead of a discovery tool although they pretend the opposite. That's why we don't see anything new out of those experiments.

Thank you for your input.

ADD REPLY • link 4.5 years ago by htchd7ji • 0

1

Entering edit mode

Deciding what to do downstream of the standard differential expression analysis is where the really creative bioinformatics can come into play. I have to admit that it's not something that can be encapsulated into a single 'package', like what you appear to be seeking. Knowing what to do will be what can make you stand out from others.

You really do have to have some hypothesis, though, and then test it using whatever techniques you know.

You may get some insight from my answer here: A: What is the best way to combine machine learning algorithms for feature selectio

Insight can also come from looking through published studies and browsing forums.

Finally, if you are merely the analyst for some clinicians / biologists, then ask them to come up with some questions / hypotheses. I became tired in my career already of been given data and having no accompanying questions with it.

ADD REPLY • link 4.5 years ago by Kevin Blighe 87k

0

Entering edit mode

After some search, I found the following tools: DESeq2, volcano plot, heatmap, GO, GSEA, clustering. However, these tools usually only deal with one to one analysis while I have 6 pairs of control and treatment. Although clustering and PCA may give some insight, that's not much. I want to know the major driver of the difference between control and treatment and whether there are such drivers?

ADD REPLY • link 4.5 years ago by htchd7ji • 0

2

Entering edit mode

Please don't post comments as answers - it makes conversations difficult to follow.

That is true, most tools will only compare one to one. But that just means you may need to change your comparison to answer your question of interest. If you're interested in the compound's effect, group all of the treat and untreated samples together and see which genes are differentially expressed.

And then feed that list into the subsequent tools/visualizations. Or just look for common differentially expressed genes between all of your pairs. This is where your expertise and critical thinking skills become important, as does having explicit, answerable questions well-defined to help guide you.

ADD REPLY • link 4.5 years ago by jared.andrews07 ★ 16k

1

Entering edit mode

moved it back to comment...

ADD REPLY • link 4.5 years ago by Kevin Blighe 87k

score 3 · Answer 1 · 2019-11-02

Ok, some basic thoughts:

Differential analysis will tell you the genes with significantly different counts between conditions. These results you can feed into enrichment analysis such as GO enrichment and GSEA. GO will check enrichment of the genes towards published "categories" such as biological functions, molecular mechanisms and functions. Still, this is very crude and often not too informative but a good starting point. GSEA (gene set enrichment analysis) will check if your up-or downregulated genes are enriched for published gene sets, e.g. as available from MSigDB. Here the direction of the fold changes matters whereas GO simply accepts gene names (which you could prefilter for the direction of change). For GO I typically use the Web applications from the Gene Ontology consortium, GSEA could be done with fgsea in R.

With differential genes you can create heatmaps based on hierarchical clustering, e.g. using Z-transformed log2-normalized counts of these genes. Check ComplexHeatmap in R. This will group samples together (column clustering) with similar overall expression (or in this case fold-changes) and also group genes together (row clustering) that show similar directions / patterns of change across the samples. This might give you an idea of 1) which cell lines are responding similarly and 2) if certain genes might act in similar pathways. It is a good starting point to reduce the complexity of having many (maybe hundreds, or thousands of genes). Different clusters of genes might have similar functions and this might give a hind on how your component influences the cells, and if there are some common molecular targets it acts on.

Eventually, you might take several clusters of genes and check again with GO/GSEA for enrichments as this then might be more detailed as comparing the bulk of DEGs. In the end you should see if there is a prominent cluster of genes / pattern / molecular function / pathway that appears to be enriched upon compound exposure in your cells which you then would need to follow up. You could compare to published datasets. Say you find that the compound seems to act on GTPase signalling, so you can compare your data with RNA-seq experiments that used GTPase inhibitors, overexpression or knockout situation. This could help confirm or falsify your working hypothesis. IF confirmed (or at least supported) do additional analysis and experiments to work out the molecular mechanisms that sufficiently explain the observed phenotype. Try to focus on a few promising (and potentially impactful) mechanisms rather than following up too many paths while never going into detail. This of course depends on what exactly your research question is. A genome-wide paper typically explores mechanisms in less detail while providing global insights than a paper that really investigates the basis of a well-defined phenotype but this in great detail.

There is typically no straight-forward way for exploratory analysis. Use your biological knowledge to decide if the above mentioned strategies give meaningful results as this is all based on statistics and correlations, but correlations != causality.