Question: [Experimental Design] Is it possible to determine completeness of a genome if you were given raw reads?
1
gravatar for Tom
3.4 years ago by
Tom20
United States
Tom20 wrote:

So here's the dilemma. I have illumina raw reads from a new undiscovered species of bacteria, and I'd like to assemble them as a draft genome. However, I don't know if my sequencing machine was able to cover 100% of the genome. I suspect it may be only 98% complete, and there may be gaps and artifacts that my runs missed and could not sequence. I want an exact number, because this 98% is a qualitative guess. However, all I have are a bunch of raw reads. I think they cover the genome an average of 25x, which is good. But, Is it even computationally possible to determine the quality/completeness of your assembly based on just raw reads? How should I change my approach to this problem?

sequencing coverage theory genome • 1.1k views
ADD COMMENTlink modified 3.4 years ago by iraun3.5k • written 3.4 years ago by Tom20
2
gravatar for iraun
3.4 years ago by
iraun3.5k
Norway
iraun3.5k wrote:

The 'completeness' of a genome is an abstract concept not easy to check.

For one hand, you can try to assemble your reads using a de-novo approach, and extract general statistics just to have a general idea about the assembly (number of scaffolds, mean length, N50...).For other hand, you can compare the size of your de-novo assembled genome to the size of a phylogenetically closed bacteria specie with a well assembled genome and see if they are similar. Also, maybe I'd try to map the assembled scaffolds against the closed bacteria genome, and calculate % the genome covered.
This is what I'd do in your case... but for sure there are another things to do, and as I said, the completeness of an assembly is not something easy to know. Also is important to consider the 'genome mappability', which depends on each genome and affects the assembly.

ADD COMMENTlink written 3.4 years ago by iraun3.5k

What would you recommend as software to visually see the contigs/reads lined up with the assembled genome scaffold? I was thinking that if I had a visual like this: http://www.dartergenomics.org/tallapoosa-darter-genome, or this http://gcat.davidson.edu/phast/img/coverage.png, that it would help me see which regions I can guess are missing.

ADD REPLYlink written 3.4 years ago by Tom20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1127 users visited in the last hour