How to curate the genome assemblies after polishing and scaffolding?
0
2
Entering edit mode
2.8 years ago
mzzzzzzzzzz ▴ 40

Hi all, I'm a newbie in genome assembling. I have long reads sequence, and I already finished running assembler, polishing and then scaffolding. Next step I guess is to manually curate the assembly? How do I know where I have gaps in the assembly?

For example, I have a quast figure like this for part of one chromosome aligned against reference genome: Is the blank part around 13Mb a huge gap? I find quite some this kind of blank regions in my scaffolds, and I assume that it's not a gap. If this is true, then how can I see where the gap is and what genes are in the gap region? Quast figure for part of chromosome 1

assembly genome • 1.3k views
ADD COMMENT
1
Entering edit mode

Without anymore information it is impossible to tell whether this gap in you alignment is due to a deletion or a scaffold. This would be highly dependent on how you performed the scaffolding.

One initial step may be to visualise your genome against the reference using a dotplot prior to scaffolding. \ Maybe look at this very easy to use tool called Dgenies \ This will make it clear if you are looking at gaps between contigs or within contigs

In terms of manually curating you can look at reducing gaps between contigs with gap joining tools, orientate the contigs to your reference, remove redundant contigs if heterozygous, etc

ADD REPLY
0
Entering edit mode

Thanks a lot for replying me! As you suggested, I used Dgenies to generate the dot plot by aligning my assembly against the reference genome. Below is the alignment of chr1-chr4 (x axis) from left to right. The x axis is the reference and the y axis is contigs from my assembly.

enter image description here

I have some questions about this dot plot. (1) How do I know whether chromosome 1 and 2 have gaps? (Quast assessment shows no gaps in all of my contigs. Does this imply that there is no gap in chromosome 1 and 2, as they are formed by only one contig each?) (2) How should I understand the yellow lines in contig 1 (the bottom contig) appear in chr2 and chr4? I think the yellow lines indicates the gap in the reference genome? (3) There is a contig (in black) in chr3 that are completely vertical, which means it is highly identical to the reference but can not be aligned to the chr3 in reference? I have difficulty to understand this point...

Also, how can I manually orientate my contigs after aligning to the reference? I tried minimap2 for alignments, but I can't open the alignment file in IGV. Is there any better ways to do it?

Thanks a lot in advance!

ADD REPLY
0
Entering edit mode

Very interesting \ So if these are your raw contigs (i.e. no scaffolds/Ns present) than as you said; it implies there is no gap in chr1 or chr2 and perhaps even chr4 (assuming they are in order from left to right and reference is on the bottom and your denovo assembly is on the sides). \ So you should have a look into understanding what the dotplot is showing but what I think the yellow lines are suggesting is that in your reference genome there are mainly gaps within regions containing a large number of tandem repeats. It appears with your assembly, thanks to the long reads, you have been able to assemble through the several repeat regions. This is why this region extends vertically in your assembly, and then these repeat regions appear to contain homology within each chromosome hence the yellow lines. They are probably something like a centromere considering each chromosome contains one and their similarity across chromosomes. \

Myself, to manually orientate I use scaffolding tools such as ragout or ragoo.

ADD REPLY
0
Entering edit mode

Thank you, Samuel!

ADD REPLY

Login before adding your answer.

Traffic: 2417 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6