Investigating a genome assembly
2
1
Entering edit mode
9.3 years ago
thjnant ▴ 160

Hello,

I have to work with a genome assembly which consists of about 32000 scaffolds (obviously, the scaffolds are not annotated) as a reference for SNP calling. However, before proceeding further, I would like to:

  1. Read about the process of making a genome assembly.
  2. Get basic statistics of my genome assembly.

I have been searching to find a good review but I thought to ask if there is any particular review that you find it useful.

It would also be great if you could mention the basic statistics that one should calculate to know about the quality and properties of an assembly. I have this list, is there any other thing that should be added to it:

Coverage - Assembly Size - Total Contig Length - Scaffolds - Scaffold N50 - Contigs - Contig N50 - %Q40

Thank you in advance, Homa

Assembly • 4.7k views
ADD COMMENT
2
Entering edit mode
9.3 years ago
iraun 6.2k

The genome assembly is quite complicated process, not so for having a general idea about the it, but for understanding how the algorithm works for each tool. I think that you'll find quite a lot information about the process searching on Google and Wikipedia.


For other hand, the statistics you have mention are pretty good to assess the quality. You can also calculate N90, length of the longest contig or scaffold, % CEGs (conserved core eukaryotic genes) mapped. I would suggest you to give a try to QUAST (QUality ASsesment Tool for Genome Assembly. It can be used to assess the quality of genome assemblies.

ADD COMMENT
1
Entering edit mode
9.3 years ago
Prakki Rama ★ 2.7k

Apart from basic stats, researchers also map their reads (genome or transcriptome) to check percentage of the reads mapping to the genome scaffolds. If in addition, there are sequences of genome obtained from other techniques (like BAC) or EST, transcriptome sequence of the same organism or nearest neighbor, checking their alignment against the genome would also give deeper insight of the quality of assembly.

Also, you check this paper A beginner's guide to eukaryotic genome annotation for your reference.

ADD COMMENT

Login before adding your answer.

Traffic: 1417 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6