Question: The --contig option for SPAdes
1
gravatar for 934963534
5 months ago by
93496353410
93496353410 wrote:

Hello everyone,

I am using the SPAdes with --trusted-contigs and --untrusted-contigs.

I want to know how these contigs exactly help with graph construction and path extension. Is any document mentioned the related algorithms?

spades contig assembly • 385 views
ADD COMMENTlink modified 5 months ago by dark.lord0 • written 5 months ago by 93496353410
1

Taken from the SPAdes manual:

Additional contigs

In case you have contigs of the same genome generated by other assembler(s) and you wish to merge them into SPAdes assembly, you can specify additional contigs using --trusted-contigs or --untrusted-contigs. First option is used when high quality contigs are available. These contigs will be used for graph construction, gap closure and repeat resolution. Second option is used for less reliable contigs that may have more errors or contigs of unknown quality. These contigs will be used only for gap closure and repeat resolution. The number of additional contigs is unlimited.

Note, that SPAdes does not perform assembly using genomes of closely-related species. Only contigs of the same genome should be specified.

ADD REPLYlink modified 5 months ago • written 5 months ago by Sej Modha3.6k

I have read it before and wonder how the contigs help with graph construction, gap closure and repeat resolution, since they should not be considered just as single read.

ADD REPLYlink written 5 months ago by 93496353410
1
gravatar for dark.lord
5 months ago by
dark.lord0
dark.lord0 wrote:

My understanding of this is that the contigs are used as a backbone for assembly. Using these scaffolds/contigs as a backbone, the assembly becomes more of a mapping - the raw (trimmed) reads stretch the contigs, and that's how it helps with gap closure. The same thing could happen with the mapping + re-assembly, but this would require many more steps because no graph is constructed, whereas in SPAdes it is.

In numbers - this also helped me to better understand - say we have a 1000 bp long contig. A 250-bp long read overlaps the terminal 100 bp of the contig. This means that there is a 150 bp long stretch over the tip of the 1000 bp contig, and on this stretch, other reads can overlap - something that doesn't happen when you map.

Hope it helps.

Cheers

Stefano

ADD COMMENTlink written 5 months ago by dark.lord0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1883 users visited in the last hour