Question: Investigating a genome assembly
4.2 years ago
thjnant wrote:


I have to work with a genome assembly which consists of about 32000 scaffolds (obviously, the scaffolds are not annotated) as a reference for SNP calling. However, before proceeding further, I would like to:

1. Read about the process of making a genome assembly.

2. Get basic statistics of my genome assembly.

I have been searching to find a good review but I thought to ask if there is any particular review that you find it useful.

It would also be great if you could mention the basic statistics that one should calculate to know about the quality and properties of an assembly. I have this list, is there any other thing that should be added to it:

Coverage - Assembly Size - Total Contig Length - Scaffolds - Scaffold N50 - Contigs - Contig N50 - %Q40


Thank you in advance, Homa

assembly
written 4.2 years ago by thjnant
4.2 years ago
iraun wrote:

The genome assembly is quite complicated process, not so for having a general idea about the it, but for understanding how the algorithm works for each tool. I think that you'll find quite a lot information about the process searching on Google and Wikipedia.

For other hand, the statistics you have mention are pretty good to assess the quality. You can also calculate N90, length of the longest contig or scaffold, % CEGs (conserved core eukaryotic genes) mapped. I would suggest you to give a try to QUAST (QUality ASsesment Tool for Genome Assembly. It can be used to assess the quality of genome assemblies.

written 4.2 years ago by iraun
4.2 years ago
Prakki Rama wrote:

Apart from basic stats, researchers also map their reads (genome or transcriptome) to check percentage of the reads mapping to the genome scaffolds. If in addition, there are sequences of genome obtained from other techniques (like BAC) or EST, transcriptome sequence of the same organism or nearest neighbor, checking their alignment against the genome would also give deeper insight of the quality of assembly.

Also, you check this paper A beginner's guide to eukaryotic genome annotation for your reference.


written 4.2 years ago by Prakki Rama
