Question: Improve illumina short read assembly using PacBio long reads
1
gravatar for shachigahoimbi
8 weeks ago by
shachigahoimbi10 wrote:

I am trying to assemble goat genome (genome size=2.9 Gb) and I have goat genome sequencing data from short and long reads 1) short read data from illumina (genome coverage ~37x). 2) long read data from PacBio (genome coverage ~1.5x)

I have assembled illumina short reads using ABySS and SOAPdenovo and got best N50 1884 at K-mer of 41. I would like to improve short read assembly using PacBio long reads data. Because of the low coverge (1.5x genome coverage) of PacBio data, I am unable to decide which software would be best for the improvement of N50 using long reads.

I tried HybridSPADES for hybrid assembly of my short and long read data but it is giving issue regrding memory (out of memory).

Please let me know, how could I improve short read assembly using low coverage (~1,5 X coverage) long reads.

ADD COMMENTlink modified 7 weeks ago by Vitis2.2k • written 8 weeks ago by shachigahoimbi10

What was your input read length of the illumina data?

an optimal Kmer of 41 seems pretty low , what range did you evaluate?

ADD REPLYlink written 8 weeks ago by lieven.sterck5.5k
1
gravatar for colindaven
8 weeks ago by
colindaven1.5k
Hannover Medical School
colindaven1.5k wrote:

Maybe you can't, 1.5X actually means 0X for a good proportion of the genome.

Generally, you want 20X + Pacbio coverage to make a good assembly.

It might pay to use another better assembly - I think a goat is available - for orientating your short scaffolds.

ADD COMMENTlink written 8 weeks ago by colindaven1.5k
1
gravatar for jean.elbers
8 weeks ago by
jean.elbers1.1k
jean.elbers1.1k wrote:

You could use your best Illumina assembly as input for whole-genome alignment with Cactus (https://github.com/ComparativeGenomicsToolkit/cactus) to NCBI accession GCA_004361675.1 as the reference.You could then use Ragout (https://github.com/fenderglass/Ragout) to generate a reference-guided assembly of your individual based off of the best available goat genome assembly GCA_004361675.1.

ADD COMMENTlink modified 8 weeks ago • written 8 weeks ago by jean.elbers1.1k
1
gravatar for lieven.sterck
8 weeks ago by
lieven.sterck5.5k
VIB, Ghent, Belgium
lieven.sterck5.5k wrote:

Since you're already on the ABySS route, you could give LINKS a try: that's a long read scaffolder from the same people/group as ABySS.

but as mentioned by others here as well, 1,5x coverage will likely not get you very far

ADD COMMENTlink modified 7 weeks ago • written 8 weeks ago by lieven.sterck5.5k
0
gravatar for Vitis
7 weeks ago by
Vitis2.2k
New York
Vitis2.2k wrote:

filtlong may help you filter and correct long reads using your short reads.

https://github.com/rrwick/Filtlong

Then the corrected long reads may help you scaffold some contigs. But I agree with the other answers: 1.5X of long reads wouldn't get you very far.

ADD COMMENTlink written 7 weeks ago by Vitis2.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 856 users visited in the last hour