Question: Investigating a genome assembly
1
gravatar for thjnant
4.7 years ago by
thjnant90
Germany
thjnant90 wrote:

Hello,

I have to work with a genome assembly which consists of about 32000 scaffolds (obviously, the scaffolds are not annotated) as a reference for SNP calling. However, before proceeding further, I would like to:

1. Read about the process of making a genome assembly.

2. Get basic statistics of my genome assembly.

I have been searching to find a good review but I thought to ask if there is any particular review that you find it useful.

It would also be great if you could mention the basic statistics that one should calculate to know about the quality and properties of an assembly. I have this list, is there any other thing that should be added to it:

Coverage - Assembly Size - Total Contig Length - Scaffolds - Scaffold N50 - Contigs - Contig N50 - %Q40

                 

Thank you in advance, Homa

assembly • 3.1k views
ADD COMMENTlink modified 4.7 years ago by Prakki Rama2.3k • written 4.7 years ago by thjnant90
2
gravatar for iraun
4.7 years ago by
iraun3.6k
Norway
iraun3.6k wrote:

The genome assembly is quite complicated process, not so for having a general idea about the it, but for understanding how the algorithm works for each tool. I think that you'll find quite a lot information about the process searching on Google and Wikipedia.


For other hand, the statistics you have mention are pretty good to assess the quality. You can also calculate N90, length of the longest contig or scaffold, % CEGs (conserved core eukaryotic genes) mapped. I would suggest you to give a try to QUAST (QUality ASsesment Tool for Genome Assembly. It can be used to assess the quality of genome assemblies.

ADD COMMENTlink written 4.7 years ago by iraun3.6k
1
gravatar for Prakki Rama
4.7 years ago by
Prakki Rama2.3k
Singapore
Prakki Rama2.3k wrote:

Apart from basic stats, researchers also map their reads (genome or transcriptome) to check percentage of the reads mapping to the genome scaffolds. If in addition, there are sequences of genome obtained from other techniques (like BAC) or EST, transcriptome sequence of the same organism or nearest neighbor, checking their alignment against the genome would also give deeper insight of the quality of assembly.

Also, you check this paper A beginner's guide to eukaryotic genome annotation for your reference.

 

ADD COMMENTlink modified 4.7 years ago • written 4.7 years ago by Prakki Rama2.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 946 users visited in the last hour