Question: still not clear about differences between de novo genome sequencing and genome reseqencing
Hi all, I am not clear about differences between de novo assembly and genome resequencing.

I know that if there is no reference genome of the species I am interested in, I need to do de novo assembly to assemble and annotate the species. And if it is already assembled and annotated, I just need to do genome resequencing to analyse the structure variation (or even genes' sequence difference?)

This is a point of the view of their aims. But what about feasibility? I couldn't tell the difference during the reads generation. Though there are BAC clone, etc. to evaluate data during de novo assembly.) I saw that one published de novo genome was assembled with about 300Gb reads of 100 bp x 2 (it's published in 2017, so not old at all. the sequencing depth is about 100x). If one "resequenced" genome derived from 1Tb reads of 100 bp x 2 (good base quality), can I reuse its data to do de novo assembly instead? For example, Burmese Cat and Ragdoll are both cats but they are different cats. Now the genome already de novo -ed and being a reference is of Burmese Cat (from 300Gb reads), "the resequenced genome" is of Ragdoll (from 1Tb reads). Am I able to de novo and annotate the Ragdoll's genome so that other Ragdolls can have a better reference?

Tell me it's absurd if I make any factual errors. Thank you.


Your basic premise in second para is correct.

That said biology is rarely a linear accounting of nucleotides (just like knowing DNA sequence is not enough to understand how it encodes information that ultimately forms proteins and makes a cell function). A lot would depend of organization of genome you are interested in. If you know nothing about it then that would be your first task. Number of chromosomes, ploidy of the genome (diploid genomes (2 copies) are hard enough but others may be multiploid and become impossible to assemble) and number of repeats there are (which make assembly impossible with just short reads, need long reads like PacBio/Nanopore).

You can use a related genome as a guide for assembly of a new one but if the two species are distinct then you can't really use the related genome as a reference for the new one. It would depend on how far apart they are in evolutionary term.

You must have heard the terminology - $1K genome/$100,000 annotation. You could easily collect a terabase of sequence with technology today. Assembling raw nucleotide information into a usable genome could easily take 100x that in time/money.

There is a highly related genome accessible. I should try use it. I am only allowed to use data from the Internet, so no long reads till now (and I don't think there will be in a short time). Would the absence of long reads be fatal? Am I too confident with my 1T short reads?!

Generally you need to make long read libraries yourself if you have a particular interest in finishing a genome. It is good to be confident but cautious. You can only use data you have at hand so make the best of what you have.

Many thanks. I feel no more self-doubt but full of strength!

