Question: Scaffolds and contigs in the assembled genomes
0
gravatar for p4alindromic
8 months ago by
p4alindromic10
p4alindromic10 wrote:

I have several questions regarding the scaffolds and contigs.

Are they pieces of DNA that are assembled into contigs and scaffolds but cannot be confidently combined with chromosome assembly?

Do they already have corresponding sequences in the assembled chromosomes, or is their sequence composition completely different than the chromosomes already assembled?

And finally, should we remove them from the reference genome before aligning?

Thanks

scaffold alignment • 254 views
ADD COMMENTlink modified 8 months ago by Mensur Dlakic5.8k • written 8 months ago by p4alindromic10
1
gravatar for Mensur Dlakic
8 months ago by
Mensur Dlakic5.8k
USA
Mensur Dlakic5.8k wrote:

In the same order you asked:

Any continuous piece of DNA that is obtained by reliably overlapping shorter reads can be considered a contig. That means that a 200bp of continuous DNA would be a contig, but it would take a much longer piece to call it a scaffold. I don't know if there is a formal cutoff when a config becomes a scaffold, but let's just say that a scaffold is definitely a contig, while the reverse is not necessarily true.

Not sure I understand this question. If you already have a fully assembled chromosome and scaffold/contigs are not in it, that would mean they are potentially parts of a different chromosome. If you are talking about a chromosome from a reference assembly, then most scaffolds/contigs should be in it assuming an assembly without contamination.

It won't harm the alignment to reference genome whether you remove them or not, though the final aligned fraction will be smaller if these scaffolds/contigs have no matches in the reference.

ADD COMMENTlink written 8 months ago by Mensur Dlakic5.8k

Thanks for the reply. My second question is actually about the fasta entries in the "nonchromosomal" file in the following link: ftp://ftp.ensembl.org/pub/release-98/fasta/homo_sapiens/dna/

I am assuming these are scaffolds/contigs that are not part of the assembled chromosomes. Is that correct? If so, what could these sequences be? Are they actually part of the chromosomes but not confidently assembled? Can parts of these sequences be the same as the ones in assembled chromosomes?

ADD REPLYlink written 8 months ago by p4alindromic10

In that directory you referenced there is a README file, and it says inside of it:

Non-chromosomal assembly sequences: e.g. mitochondrial genome, sequence contigs not yet mapped on chromosomes

ADD REPLYlink written 8 months ago by Mensur Dlakic5.8k

I read that but what does "not yet mapped" mean? That still leaves my questions unanswered:

Are they actually part of the chromosomes (do we have gaps in the current chromosome assembly that these sequences could be later mapped confidently)? Can parts of these sequences be the same as the ones in assembled chromosomes?

ADD REPLYlink written 8 months ago by p4alindromic10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1098 users visited in the last hour