Longer scaffolds from multiple eukaryote genome assemblies
3
0
Entering edit mode
3.6 years ago

I have two fly genomes from a species for which there are no other genomes available. One genome has been assembled from PacBio reads (N50=~400,000bp) and one from 10X (N50=~250,000bp). The genome is about 250-300Gb long.

I would like to use the scaffolds from both these genomes to create an assembly with longer scaffolds.

I have tried metassembler (https://sourceforge.net/projects/metassembler/) but it requires mate pairs to find the correspondences between the assemblies and I do not have such paired-end reads.

What tools would you recommend to produce longer scaffolds from multiple assemblies?

EDIT:

Here is a list of software I am presently considering:

genome scaffolding • 1.6k views
1
Entering edit mode

0
Entering edit mode

Is FALCON supposed to be able to merge different assemblies produces by different technologies?

1
Entering edit mode

It is able to assemble long sequences from PacBio or miION, I don't think that you can find a specific software for doing exactly what you are looking for. Longer scaffolds from scaffolds? or even if you find a software to do that I think you will need lots of further karyotype validations to use your final sequences.

1
Entering edit mode

0
Entering edit mode

Yes, I am looking at GARM. See my edit above.

1
Entering edit mode

The genome is about 250-300Gb long.

Please keep that in mind when recommending software. What kind of organism is that? Is ploidy a contributor?
Software list from Omicstools.

0
Entering edit mode

The Omicstools list is where I found GARM and Camsa. I sifted through the list and kept a few that looked promising. These two are my best bet for now.

The fly is diploid. The genomes were not assembled from a double haploid individual.

1
Entering edit mode

There are a couple others mentioned in this past thread.

0
Entering edit mode

Thanks. PBJelly has already been run on the PacBio assembly using the 10X reads but I never heard of OPERA-LG. I'll check it out.

1
Entering edit mode
3.6 years ago

I would imagine that you need to look outside the 'classic" field of high throughput sequencing. You most likely need a long read assembler that works off end-overlaps rather than the de Bruijn graph type of assemblers.

For example this (I found this as a search so I can't comment on its applicability)

https://github.com/isovic/racon

0
Entering edit mode

So basically treat contigs and scaffolds as long reads? That would mean VERY low coverage, on the order of 1 to 2. I'll explore this avenue but something tells me the assemblers are going to struggle with such a low coverage.

1
Entering edit mode
3.6 years ago

Hi Eric!

Maybe you have already done this, I'd align the two genomes first to see a synteny map. And depending on what you see, I'd plan the assembly. For the alignment Mummer (http://mummer.sourceforge.net/) may help, or another tool.

Sergey

0
Entering edit mode

I've used Synima (https://github.com/rhysf/Synima) to generate a synteny map (relatively painlessly for individual eukaryotic chromosomes). It might be worth a try. For annotation input, I've used MAKER2 output: CDS=transcripts from MAKER2, PEP=proteins from MAKER2, gff3=gff files from MAKER2 (http://www.yandell-lab.org/software/maker.html).

0
Entering edit mode
3.3 years ago
harishk0201 ▴ 110

Hey Eric,

Try Quickmerge : https://github.com/mahulchak/quickmerge

But are you sure that these genomes are in Gbs rather than Mbs? Seems a bit tad too much.

You can try HaploMerger2 as well.