Scaffolds and contigs in the assembled genomes
1
0
Entering edit mode
4.5 years ago
p4alindromic ▴ 10

I have several questions regarding the scaffolds and contigs.

Are they pieces of DNA that are assembled into contigs and scaffolds but cannot be confidently combined with chromosome assembly?

Do they already have corresponding sequences in the assembled chromosomes, or is their sequence composition completely different than the chromosomes already assembled?

And finally, should we remove them from the reference genome before aligning?

Thanks

alignment scaffold • 1.7k views
ADD COMMENT
1
Entering edit mode
4.5 years ago
Mensur Dlakic ★ 27k

In the same order you asked:

Any continuous piece of DNA that is obtained by reliably overlapping shorter reads can be considered a contig. That means that a 200bp of continuous DNA would be a contig, but it would take a much longer piece to call it a scaffold. I don't know if there is a formal cutoff when a config becomes a scaffold, but let's just say that a scaffold is definitely a contig, while the reverse is not necessarily true.

Not sure I understand this question. If you already have a fully assembled chromosome and scaffold/contigs are not in it, that would mean they are potentially parts of a different chromosome. If you are talking about a chromosome from a reference assembly, then most scaffolds/contigs should be in it assuming an assembly without contamination.

It won't harm the alignment to reference genome whether you remove them or not, though the final aligned fraction will be smaller if these scaffolds/contigs have no matches in the reference.

ADD COMMENT
0
Entering edit mode

Thanks for the reply. My second question is actually about the fasta entries in the "nonchromosomal" file in the following link: ftp://ftp.ensembl.org/pub/release-98/fasta/homo_sapiens/dna/

I am assuming these are scaffolds/contigs that are not part of the assembled chromosomes. Is that correct? If so, what could these sequences be? Are they actually part of the chromosomes but not confidently assembled? Can parts of these sequences be the same as the ones in assembled chromosomes?

ADD REPLY
0
Entering edit mode

In that directory you referenced there is a README file, and it says inside of it:

Non-chromosomal assembly sequences: e.g. mitochondrial genome, sequence contigs not yet mapped on chromosomes

ADD REPLY
0
Entering edit mode

I read that but what does "not yet mapped" mean? That still leaves my questions unanswered:

Are they actually part of the chromosomes (do we have gaps in the current chromosome assembly that these sequences could be later mapped confidently)? Can parts of these sequences be the same as the ones in assembled chromosomes?

ADD REPLY

Login before adding your answer.

Traffic: 2494 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6