Question: What Analysis after Denovo Assembly of RNA-seq of a species?
gravatar for Amk
3.3 years ago by
Amk120 wrote:

Dear All,

What all analysis we can do after the Denovo assembly for Transcriptome sequencing data of a new species. As the transcriptome data is not published before. I can mention some such as..

1. Do a Pathway analysis using KEGG.
2. Do a GO Annotation.
3. Try OrthoVenn to plot graphs for protein orthologs.
4. Try a BGI-WEGO plot.

This might be a broad question. But your inputs can help lot of researchers who are new to the Genomics and can give some ideas. Even a small input from you can be helpful.
Thank you.



transcriptome rna-seq analysis • 1.3k views
ADD COMMENTlink modified 3.3 years ago by Chris Cole680 • written 3.3 years ago by Amk120

Any analyses that seek answers to the research questions that were set prior to sequencing. Fishing expeditions are rather pointless IMO.

ADD REPLYlink written 3.3 years ago by 5heikki8.3k


Sorry I have edited question. As it is a assembly to draft transcriptome for the first time. Thank you.


ADD REPLYlink written 3.3 years ago by Amk120

Perhaps you're misusing the term "fishing expedition" which is known as multiple testing. This is where you perform many different statistical tests and then choose the best one "If you torture the data long enough, it will confess to anything" - R. Coase. I'm sure we can all agree this is a big NO NO!

However, I have seen the term "fishing expedition" used to describe something quite different from multiple testing. Which is finding hidden internal structure of high-dimension data, generating hypothesis for future studies, data visualization, data mining (combining genomic data with publicly available dataset's like patient medical records) etc. I can't really understand why you would be against this? The viewpoint seems quite antiquated. For example, here is an excerpt from the book Philosophy of Complex Systems (which I won't be reading because I'm not willing to spend 198 euros on it)


IMO "Fishing expeditions" used in the above context HAVE made way for unexpected, ground-breaking discoveries in biomedical research. Much of what we know about how cells function has come from unbiased screening approaches. For example, untargeted metabolomics studies are hypothesis-generating by design and are the most exciting in terms of discovering biomarkers or elucidating metabolic profiles. Along a similar line of thinking RNA-seq is superior to qPCR where you only look at a small panel of genes of interest - you may miss gene's you weren't even thinking about a priori. Also, in single-cell RNA-seq when looking at a highly hetergenous cell population in tissue/tumor we are still finding new cell types (and subtypes) through the use of unsupervised machine learning techniques (e.g. by clustering data with PCA, tSNE, etc.) or tracking how cell's proceed through development using imputation techniques (e.g. MAGIC). There's my two-cent's FWIW

ADD REPLYlink modified 18 months ago • written 18 months ago by moldach120

ADD REPLYlink written 18 months ago by 5heikki8.3k

I fail to see how posting a link to counts as a constructive conversation. Could you please elaborate how 1) Doing a Pathway analysis using KEGG 2) Doing a GO Annotation qualifies as NO "clearly defined plan or purpose in the hope of discovering useful information"?

ADD REPLYlink written 18 months ago by moldach120
gravatar for Mehmet
3.3 years ago by
Mehmet460 wrote:

You might be interested in searching orthologous genes and making phylogenetic tree by using the orthologous. and if it is possible, you might be doing miRNA searching in your transcriptome, SNP detection, CNV etc. but these all are dependent on your experiment conditions.

ADD COMMENTlink written 3.3 years ago by Mehmet460
gravatar for Chris Cole
3.3 years ago by
Chris Cole680
Chris Cole680 wrote:

Firstly check that the assembly is good quality. Use TransRate to see if the assembly is consistent with the read data and isn't contaminated with misassemblies. If it's a eurkaryote you can check with BUSCO to see if it has a reasonable number of core genes.

Any pathway, KEGG or GO analysis is going to be severly hampered by the fact that this is a new species. I doubt it'll be terribly useful. You're better off looking for homologues in a related species.

InterProScan is a good way of assigning some sort of functionality to your transcripts.

ADD COMMENTlink written 3.3 years ago by Chris Cole680

Can you (or anyone else) please suggest specific tools or a workflow for pathway and gene-set analysis (GO, KEGG, KOG) of RNA-Seq data (transcriptome)?

I'm really stuck here and I hope someone can take the time to help a frustrated, but motivated, colleague out!

Over the past couple of days, my internet sleuthing (on biostars, seqanswers, blogs, etc) has been more-or-less fruitless; I keep hearing to do a KEGG or GO analysis at such a high-level but haven't found any good resources that actually teach someone to go about it.

For example, Matt MacManes has a VERY good protocol/tutorial on how to go about "assembling" and "annotating" a transcriptome. Now this is a technically complicated process involving several discrete steps. I assume that doing a "KEGG" or "GO" analysis must be somewhat similar as I haven't found good resources for actually doing this on my data

I appreciate any suggestions or a link to any relevant previous post on this blog, or others.


ADD REPLYlink modified 21 months ago • written 21 months ago by moldach120

Have you tried to use Blast2GO for GO, KEGG and KOG and gene set enrichment analysis? More generally, Blast2GO does functional annotation of a genome or transcriptome data set.

ADD REPLYlink written 20 months ago by Mehmet460

Thank you for your answer Mehmet, I sincerely appreciate someone responding to this (not only for my sake but the rest of the NGS/Biostar community); however, Blast2GO is a proprietary software. Although there are a some benefits to pay-to-play(work) software (See "The Open Source Software Debate in NGS Bioinformatics") I vehemently hate the idea of anything that isn't open-source and free-to-use. Why should I pay for something if I can get it for free?

I started off programing in a laboratory with proprietary software (MATLAB) and learned just how expensive and useless such skills can be when you migrate to another lab that doesn't have it and is not willing to fork-out $$$ for it. I moved into R and Python programming and never found problem doing things I did in MATLAB (I was able to find open-source tools in most cases like Bioconductor) and I've never looked back.

If I had moved into a lab where the PI had thought the "hidden" cost in terms of time spent looking for and open-source tool (or building one themselves) outweighed the cost of Blast2GO then this would be a great suggestion. However, both myself and my PI thought Blast2GO was prohibitively expensive for the needs of our small lab (we won't be using it many times year-after-year).

Anyways, I'll quit beating the dead horse because I know there is going to be lab managers out there somewhere who are worried about the constant churn of students, post-docs, etc. and the inability to reproduce results and will feel it's prudent use of resources to buy software off the shelf, break the plastic, install it, and with little customization (and no programming experience) have their staff members run it over-and-over-and-over-and-over and get their "money's worth".

Anyways, I did figure out how to do this using open-source software (from a pre-pub early 2017) and will edit my post when I have a chance. However, in the meantime, can anyone suggest alternatives? There's more than one way to skin a dead cat and I feel it would be useful to the community in terms of future bench-marking.

Happy coding :)

ADD REPLYlink modified 18 months ago • written 18 months ago by moldach120
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1227 users visited in the last hour