When I view sequencing data in applications like Tablet, I see that the reads often come in clumps with gaps between them, e.g. here is ~150 bp long island of reads with gaps on both sides.
As I understand it, 50 base pair long reads are assembled into a genome by utilizing the overlapping areas. If there are gaps in the resulting genome, then the reference genome must be used to align these reads, right?
For sequencing of patients with genetic problems related to cancer or something like Wolf-Hirschhorn syndrome where the genome may significantly differ from a reference genome (e.g. an arm of a chromosome is missing), how could aligning islands of short reads in the differing regions be reliable? For researchers, wouldn't these differing areas be some of the areas most important to have accuracy?
Thanks for confirming my suspicion, I was curious and just making sure I wasn't misunderstanding something obvious.