Question: Improve illumina short read assembly using PacBio long reads
1
gravatar for shachigahoimbi
12 months ago by
shachigahoimbi10 wrote:

I am trying to assemble goat genome (genome size=2.9 Gb) and I have goat genome sequencing data from short and long reads 1) short read data from illumina (genome coverage ~37x). 2) long read data from PacBio (genome coverage ~1.5x)

I have assembled illumina short reads using ABySS and SOAPdenovo and got best N50 1884 at K-mer of 41. I would like to improve short read assembly using PacBio long reads data. Because of the low coverge (1.5x genome coverage) of PacBio data, I am unable to decide which software would be best for the improvement of N50 using long reads.

I tried HybridSPADES for hybrid assembly of my short and long read data but it is giving issue regrding memory (out of memory).

Please let me know, how could I improve short read assembly using low coverage (~1,5 X coverage) long reads.

ADD COMMENTlink modified 12 months ago by Vitis2.3k • written 12 months ago by shachigahoimbi10

What was your input read length of the illumina data?

an optimal Kmer of 41 seems pretty low , what range did you evaluate?

ADD REPLYlink written 12 months ago by lieven.sterck7.8k
1
gravatar for colindaven
12 months ago by
colindaven2.2k
Hannover Medical School
colindaven2.2k wrote:

Maybe you can't, 1.5X actually means 0X for a good proportion of the genome.

Generally, you want 20X + Pacbio coverage to make a good assembly.

It might pay to use another better assembly - I think a goat is available - for orientating your short scaffolds.

ADD COMMENTlink written 12 months ago by colindaven2.2k
1
gravatar for jean.elbers
12 months ago by
jean.elbers1.4k
jean.elbers1.4k wrote:

You could use your best Illumina assembly as input for whole-genome alignment with Cactus (https://github.com/ComparativeGenomicsToolkit/cactus) to NCBI accession GCA_004361675.1 as the reference.You could then use Ragout (https://github.com/fenderglass/Ragout) to generate a reference-guided assembly of your individual based off of the best available goat genome assembly GCA_004361675.1.

ADD COMMENTlink modified 12 months ago • written 12 months ago by jean.elbers1.4k
1
gravatar for lieven.sterck
12 months ago by
lieven.sterck7.8k
VIB, Ghent, Belgium
lieven.sterck7.8k wrote:

Since you're already on the ABySS route, you could give LINKS a try: that's a long read scaffolder from the same people/group as ABySS.

but as mentioned by others here as well, 1,5x coverage will likely not get you very far

ADD COMMENTlink modified 12 months ago • written 12 months ago by lieven.sterck7.8k
0
gravatar for Vitis
12 months ago by
Vitis2.3k
New York
Vitis2.3k wrote:

filtlong may help you filter and correct long reads using your short reads.

https://github.com/rrwick/Filtlong

Then the corrected long reads may help you scaffold some contigs. But I agree with the other answers: 1.5X of long reads wouldn't get you very far.

ADD COMMENTlink written 12 months ago by Vitis2.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1252 users visited in the last hour