Question: Longer scaffolds from multiple eukaryote genome assemblies
0
gravatar for Eric Normandeau
12 months ago by
Quebec, Canada
Eric Normandeau10k wrote:

I have two fly genomes from a species for which there are no other genomes available. One genome has been assembled from PacBio reads (N50=~400,000bp) and one from 10X (N50=~250,000bp). The genome is about 250-300Gb long.

I would like to use the scaffolds from both these genomes to create an assembly with longer scaffolds.

I have tried metassembler (https://sourceforge.net/projects/metassembler/) but it requires mate pairs to find the correspondences between the assemblies and I do not have such paired-end reads.

What tools would you recommend to produce longer scaffolds from multiple assemblies?

EDIT:

Here is a list of software I am presently considering:

scaffolding genome • 739 views
ADD COMMENTlink modified 10 months ago by harishk020120 • written 12 months ago by Eric Normandeau10k
1

what about FALCON?

ADD REPLYlink written 12 months ago by Buffo1.2k

Is FALCON supposed to be able to merge different assemblies produces by different technologies?

ADD REPLYlink written 12 months ago by Eric Normandeau10k
1

It is able to assemble long sequences from PacBio or miION, I don't think that you can find a specific software for doing exactly what you are looking for. Longer scaffolds from scaffolds? or even if you find a software to do that I think you will need lots of further karyotype validations to use your final sequences.

ADD REPLYlink modified 12 months ago • written 12 months ago by Buffo1.2k
1

How about GARM?

ADD REPLYlink modified 12 months ago • written 12 months ago by Sej Modha3.9k

Yes, I am looking at GARM. See my edit above.

ADD REPLYlink written 12 months ago by Eric Normandeau10k
1

The genome is about 250-300Gb long.

Please keep that in mind when recommending software. What kind of organism is that? Is ploidy a contributor?
Software list from Omicstools.

ADD REPLYlink modified 12 months ago • written 12 months ago by genomax59k

The Omicstools list is where I found GARM and Camsa. I sifted through the list and kept a few that looked promising. These two are my best bet for now.

The fly is diploid. The genomes were not assembled from a double haploid individual.

ADD REPLYlink written 12 months ago by Eric Normandeau10k
1

There are a couple others mentioned in this past thread.

ADD REPLYlink modified 12 months ago by Eric Normandeau10k • written 12 months ago by genomax59k

Thanks. PBJelly has already been run on the PacBio assembly using the 10X reads but I never heard of OPERA-LG. I'll check it out.

ADD REPLYlink written 12 months ago by Eric Normandeau10k
1
gravatar for Istvan Albert
12 months ago by
Istvan Albert ♦♦ 78k
University Park, USA
Istvan Albert ♦♦ 78k wrote:

I would imagine that you need to look outside the 'classic" field of high throughput sequencing. You most likely need a long read assembler that works off end-overlaps rather than the de Bruijn graph type of assemblers.

For example this (I found this as a search so I can't comment on its applicability)

https://github.com/isovic/racon

ADD COMMENTlink modified 12 months ago • written 12 months ago by Istvan Albert ♦♦ 78k

So basically treat contigs and scaffolds as long reads? That would mean VERY low coverage, on the order of 1 to 2. I'll explore this avenue but something tells me the assemblers are going to struggle with such a low coverage.

ADD REPLYlink written 12 months ago by Eric Normandeau10k
1
gravatar for Sergey Naumenko
12 months ago by
Sergey Naumenko330 wrote:

Hi Eric!

Maybe you have already done this, I'd align the two genomes first to see a synteny map. And depending on what you see, I'd plan the assembly. For the alignment Mummer (http://mummer.sourceforge.net/) may help, or another tool.

Sergey

ADD COMMENTlink modified 12 months ago • written 12 months ago by Sergey Naumenko330

I've used Synima (https://github.com/rhysf/Synima) to generate a synteny map (relatively painlessly for individual eukaryotic chromosomes). It might be worth a try. For annotation input, I've used MAKER2 output: CDS=transcripts from MAKER2, PEP=proteins from MAKER2, gff3=gff files from MAKER2 (http://www.yandell-lab.org/software/maker.html).

ADD REPLYlink written 10 months ago by jean.elbers460
0
gravatar for harishk0201
10 months ago by
harishk020120
harishk020120 wrote:

Hey Eric,

Try Quickmerge : https://github.com/mahulchak/quickmerge

But are you sure that these genomes are in Gbs rather than Mbs? Seems a bit tad too much.

You can try HaploMerger2 as well.

ADD COMMENTlink written 10 months ago by harishk020120
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1261 users visited in the last hour