Question: The --contig option for SPAdes
0
gravatar for 934963534
9 days ago by
9349635340
9349635340 wrote:

Hello everyone,

I am using the SPAdes with --trusted-contigs and --untrusted-contigs.

I want to know how these contigs exactly help with graph construction and path extension. Is any document mentioned the related algorithms?

spades contig assembly • 123 views
ADD COMMENTlink modified 5 days ago by dark.lord0 • written 9 days ago by 9349635340

Taken from the SPAdes manual:

Additional contigs

In case you have contigs of the same genome generated by other assembler(s) and you wish to merge them into SPAdes assembly, you can specify additional contigs using --trusted-contigs or --untrusted-contigs. First option is used when high quality contigs are available. These contigs will be used for graph construction, gap closure and repeat resolution. Second option is used for less reliable contigs that may have more errors or contigs of unknown quality. These contigs will be used only for gap closure and repeat resolution. The number of additional contigs is unlimited.

Note, that SPAdes does not perform assembly using genomes of closely-related species. Only contigs of the same genome should be specified.

ADD REPLYlink modified 7 days ago • written 7 days ago by Sej Modha2.8k

I have read it before and wonder how the contigs help with graph construction, gap closure and repeat resolution, since they should not be considered just as single read.

ADD REPLYlink written 7 days ago by 9349635340
1
gravatar for dark.lord
5 days ago by
dark.lord0
dark.lord0 wrote:

My understanding of this is that the contigs are used as a backbone for assembly. Using these scaffolds/contigs as a backbone, the assembly becomes more of a mapping - the raw (trimmed) reads stretch the contigs, and that's how it helps with gap closure. The same thing could happen with the mapping + re-assembly, but this would require many more steps because no graph is constructed, whereas in SPAdes it is.

In numbers - this also helped me to better understand - say we have a 1000 bp long contig. A 250-bp long read overlaps the terminal 100 bp of the contig. This means that there is a 150 bp long stretch over the tip of the 1000 bp contig, and on this stretch, other reads can overlap - something that doesn't happen when you map.

Hope it helps.

Cheers

Stefano

ADD COMMENTlink written 5 days ago by dark.lord0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1755 users visited in the last hour