Question: What to do after an genome assembly?
4
gravatar for ol_ucla
3.0 years ago by
ol_ucla40
Sweden
ol_ucla40 wrote:

Hi,

What can I do with an assembled genome? that is the main question I want to know.

A little more detail:
Before I have experience with RNAseq alignment, and now I'm learning genome assembly. Currently I am learning SOAPdenovo for assembly, and Valvet and MetaValvet afterwards.
The question is: sure, I got a genome assembled, and let's assume I am satisfied with the gap-closing process and the final assembly. What's next? 
I read that you can do genome annotation, and gene prediction. Anything else?

A little side-track: What would be a good tool to use for annotation and gene prediction?

Thanks in advance!

 

rna-seq next-gen assembly • 1.9k views
ADD COMMENTlink modified 3.0 years ago by Bioinformatics_NewComer210 • written 3.0 years ago by ol_ucla40
5

Celebrate :)

ADD REPLYlink written 3.0 years ago by Asaf4.6k
2
gravatar for Michael Dondrup
3.0 years ago by
Bergen, Norway
Michael Dondrup43k wrote:

A good tool chain for annotation is MAKER.

ADD COMMENTlink written 3.0 years ago by Michael Dondrup43k
2
gravatar for Antonio R. Franco
3.0 years ago by
Spain. Universidad de Córdoba
Antonio R. Franco3.4k wrote:

I can ensure you that the N50 value alone is not enough to figure out if your assembly has been good or not.

You can run velvet with different values of kmer using a simple genome (E.coli), and then, you can compare the various kmer-assembled genomes with other trusted genomes using Mauve or Act, and you will be greatly surprised. In my hands, some assembled genomes with higher N50 ended to be worse that other assemblies with lower N50

You don't mention what organism have you used for the assembly. If trusted genomes are there for evaluate, use them before going further

ADD COMMENTlink written 3.0 years ago by Antonio R. Franco3.4k

Thanks for your reply.

The fact is, I don't even know what organism i'm working on besides it's a bacteria. The genome is very small because each time it only takes about 10 mins for the pair-end sample to be assembled. 

Right now, the other factors I take into consideration are the number of contigs and max length of contig.

The person training me said that the number of contig could be a good indication, as less contig usually means better assembly. But of course, this does not say anything about it being assembled correctly. In your opinion, can number of contig or max length of contig be an evalutaion element?

Mauve, I've used that, and didn't think of it as a tool for genome validation. But now you mentioned it, it makes perfect sense to use it! 

 

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by ol_ucla40

N50 takes into account the number of contigs and max length of contigs. N50 is conceptually very easy to understand once you get it to the point, but it is hard to explain. In my hands, I got better assemblies with E.coli with lower N50 values using velvet

I would be sort of worried with a first copy of my draft genome in case that I cannot compare it with real trusted genomes. If this is would be the case, I certainly would design a strong pipeline with my reads including mate paired and long pacific bioscience sequences

A colleague of mine need 7 years to close a Pseudomonas genome ...

ADD REPLYlink written 3.0 years ago by Antonio R. Franco3.4k
1
gravatar for iraun
3.0 years ago by
iraun3.2k
Norway
iraun3.2k wrote:

For the genome annotation I would suggest you to give a try to Blast2GO: https://www.blast2go.com/.
For other hand, you can do a gene prediction analysis doing a blast, against protein database or something like that...

But, in the first place, as has been said, you should celebrate ^^
Hope it helps.
 

ADD COMMENTlink written 3.0 years ago by iraun3.2k
0
gravatar for sentausa
3.0 years ago by
sentausa610
France
sentausa610 wrote:

Compare it to something(s) else.

ADD COMMENTlink written 3.0 years ago by sentausa610
0
gravatar for 5heikki
3.0 years ago by
5heikki6.9k
Finland
5heikki6.9k wrote:

Check your original plans for why you decided to sequence in the first place? What were the research questions?

ADD COMMENTlink written 3.0 years ago by 5heikki6.9k

Maybe I should make it more clear at the first place.

The data set I use right now has already been analyzed, and it has been already done for what it was originally for. I am just re-using the data so I can learn about genome assembly, the idea and tools that can be used. So I guess I answered the reserach question along the way already?

I am posting to know what could be done next, or myabe I should rephrase the question to "What can you do with an assembled genome?"

Thanks.

ADD REPLYlink written 3.0 years ago by ol_ucla40
0
gravatar for ol_ucla
3.0 years ago by
ol_ucla40
Sweden
ol_ucla40 wrote:

Also, I read a bit more about genome assembly after the original post.

I found that, even though N50 is widely used, or used as a standard, to choose the Kmer size, it doesn't indicate if the genome is assembled correctly. So genome validation becomes an issue. 

Following that thought, I found that there are tools for validation, and I only found one: REAPR ( http://genomebiology.com/2013/14/5/R47/ )

Anyone has any experience with that? the manual is a bit hard to follow for some reasons.

Any other tools recommanded for assembly validation?

Thanks.

ADD COMMENTlink modified 3.0 years ago • written 3.0 years ago by ol_ucla40
0
gravatar for dago
3.0 years ago by
dago2.4k
Germany
dago2.4k wrote:

Are you dealing with prokaryote or eukaryote?

Gene annotaiton for prokaryote is well done by PROKKA that uses PRODIGAL for the prediction of CDS.

ADD COMMENTlink written 3.0 years ago by dago2.4k
0
gravatar for Bioinformatics_NewComer
3.0 years ago by
Genomic Island
Bioinformatics_NewComer210 wrote:

If your organism belongs to prokaryotes, like mentioned above Prodigal can used for gene prediction. Apart from that, GeneMarkS would also be a good tool.

ADD COMMENTlink modified 3.0 years ago • written 3.0 years ago by Bioinformatics_NewComer210
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1517 users visited in the last hour