7.8 years ago by
Washington University in St. Louis, MO
The "random" contigs contain DNA that we know is in the genome, but that we're having trouble accurately placing into context. For alignment, at least, it's important to use these contigs. Here's why:
If you have reads that originated from a 'random' contig, but the 'random' contigs aren't in your reference sequence, it's quite likely that the read will be mapped elsewhere in the genome, albeit at a lower quality. Some of these reads are going to pass your quality filters incorrectly and if enough of them do, it can affect your SNP calling, copy-number assessment, etc.
So yeah, alignment should pretty much always be done against all the sequences.