Longer scaffolds from multiple eukaryote genome assemblies
3
0
Entering edit mode
3.6 years ago

I have two fly genomes from a species for which there are no other genomes available. One genome has been assembled from PacBio reads (N50=~400,000bp) and one from 10X (N50=~250,000bp). The genome is about 250-300Gb long.

I would like to use the scaffolds from both these genomes to create an assembly with longer scaffolds.

I have tried metassembler (https://sourceforge.net/projects/metassembler/) but it requires mate pairs to find the correspondences between the assemblies and I do not have such paired-end reads.

What tools would you recommend to produce longer scaffolds from multiple assemblies?

EDIT:

Here is a list of software I am presently considering:

genome scaffolding • 1.6k views
ADD COMMENT
1
Entering edit mode

what about FALCON?

ADD REPLY
0
Entering edit mode

Is FALCON supposed to be able to merge different assemblies produces by different technologies?

ADD REPLY
1
Entering edit mode

It is able to assemble long sequences from PacBio or miION, I don't think that you can find a specific software for doing exactly what you are looking for. Longer scaffolds from scaffolds? or even if you find a software to do that I think you will need lots of further karyotype validations to use your final sequences.

ADD REPLY
1
Entering edit mode

How about GARM?

ADD REPLY
0
Entering edit mode

Yes, I am looking at GARM. See my edit above.

ADD REPLY
1
Entering edit mode

The genome is about 250-300Gb long.

Please keep that in mind when recommending software. What kind of organism is that? Is ploidy a contributor?
Software list from Omicstools.

ADD REPLY
0
Entering edit mode

The Omicstools list is where I found GARM and Camsa. I sifted through the list and kept a few that looked promising. These two are my best bet for now.

The fly is diploid. The genomes were not assembled from a double haploid individual.

ADD REPLY
1
Entering edit mode

There are a couple others mentioned in this past thread.

ADD REPLY
0
Entering edit mode

Thanks. PBJelly has already been run on the PacBio assembly using the 10X reads but I never heard of OPERA-LG. I'll check it out.

ADD REPLY
1
Entering edit mode
3.6 years ago

I would imagine that you need to look outside the 'classic" field of high throughput sequencing. You most likely need a long read assembler that works off end-overlaps rather than the de Bruijn graph type of assemblers.

For example this (I found this as a search so I can't comment on its applicability)

https://github.com/isovic/racon

ADD COMMENT
0
Entering edit mode

So basically treat contigs and scaffolds as long reads? That would mean VERY low coverage, on the order of 1 to 2. I'll explore this avenue but something tells me the assemblers are going to struggle with such a low coverage.

ADD REPLY
1
Entering edit mode
3.6 years ago

Hi Eric!

Maybe you have already done this, I'd align the two genomes first to see a synteny map. And depending on what you see, I'd plan the assembly. For the alignment Mummer (http://mummer.sourceforge.net/) may help, or another tool.

Sergey

ADD COMMENT
0
Entering edit mode

I've used Synima (https://github.com/rhysf/Synima) to generate a synteny map (relatively painlessly for individual eukaryotic chromosomes). It might be worth a try. For annotation input, I've used MAKER2 output: CDS=transcripts from MAKER2, PEP=proteins from MAKER2, gff3=gff files from MAKER2 (http://www.yandell-lab.org/software/maker.html).

ADD REPLY
0
Entering edit mode
3.3 years ago
harishk0201 ▴ 110

Hey Eric,

Try Quickmerge : https://github.com/mahulchak/quickmerge

But are you sure that these genomes are in Gbs rather than Mbs? Seems a bit tad too much.

You can try HaploMerger2 as well.

ADD COMMENT

Login before adding your answer.

Traffic: 2370 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6