Question

The --contig option for SPAdes

1

Entering edit mode

6.0 years ago

934963534 ▴ 20

Hello everyone,

I am using the SPAdes with --trusted-contigs and --untrusted-contigs.

I want to know how these contigs exactly help with graph construction and path extension. Is any document mentioned the related algorithms?

assembly spades contig • 4.1k views

ADD COMMENT • link updated 6.0 years ago by dark.lord ▴ 30 • written 6.0 years ago by 934963534 ▴ 20

1

Entering edit mode

Taken from the SPAdes manual:

Additional contigs

In case you have contigs of the same genome generated by other assembler(s) and you wish to merge them into SPAdes assembly, you can specify additional contigs using --trusted-contigs or --untrusted-contigs. First option is used when high quality contigs are available. These contigs will be used for graph construction, gap closure and repeat resolution. Second option is used for less reliable contigs that may have more errors or contigs of unknown quality. These contigs will be used only for gap closure and repeat resolution. The number of additional contigs is unlimited.

Note, that SPAdes does not perform assembly using genomes of closely-related species. Only contigs of the same genome should be specified.

ADD REPLY • link 6.0 years ago by Sej Modha 5.3k

0

Entering edit mode

I have read it before and wonder how the contigs help with graph construction, gap closure and repeat resolution, since they should not be considered just as single read.

ADD REPLY • link 6.0 years ago by 934963534 ▴ 20

score 1 · Answer 1 · 2018-04-19

My understanding of this is that the contigs are used as a backbone for assembly. Using these scaffolds/contigs as a backbone, the assembly becomes more of a mapping - the raw (trimmed) reads stretch the contigs, and that's how it helps with gap closure. The same thing could happen with the mapping + re-assembly, but this would require many more steps because no graph is constructed, whereas in SPAdes it is.

In numbers - this also helped me to better understand - say we have a 1000 bp long contig. A 250-bp long read overlaps the terminal 100 bp of the contig. This means that there is a 150 bp long stretch over the tip of the 1000 bp contig, and on this stretch, other reads can overlap - something that doesn't happen when you map.

Hope it helps.

Cheers

Stefano