Question

Why can't I perform large genome assembly by Velvet and ABySS?

0

Entering edit mode

4.7 years ago

putita.jira ▴ 10

Hi everyone! I'm new in bioinformatics field and I have some problems with de novo assembly. So I would like to ask for suggestions.

General information about my work are...

Illumina paired-end, read length 150 bp
whole genome sequencing (estimate genome size is 224 million bp)
100x coverage
number of reads is around 70 million reads per files (forward and reverse)
I used Amazon Web Service EC2, instance type M4xlarge (vCPUs = 4, RAM = 16 GiB) to perform all of the following processes.

After I trimmed reads, I tried to assemble with 2 programs: Velvet and ABySS, but both didn't work.

In case of Velvet, I ran velveth with this code.

velveth /home/ubuntu/velvet21 21 -shortPaired -separate -fastq.gz /home/ubuntu//149-6_1_val_1.fq.gz /home/ubuntu//149-6_2_val_2.fq.gz

and got results like this

[0.000001] Reading FastQ file /home/ubuntu/149-6_1_val_1.fq.gz;
[0.002344] Reading FastQ file /home/ubuntu/149-6_2_val_2.fq.gz;
[924.933978] 139366234 sequences found in total in the paired sequence files
[924.933995] Done
[924.983130] Reading read set file /home/ubuntu/velvet21/Sequences;
[1228.465533] 139366234 sequences found
Killed

However, I tried with much smaller genome (4.8 million bp, 1.4 million read each file) and it worked!

In case of ABySS, I performed with this code.

abyss-pe k=21 name=abyss21 in='149-6_1_val_1.fq.gz 149-6_2_val_2.fq.gz'

The result came up like this...

 ABYSS -k21 -q3    --coverage-hist=coverage.hist -s output21-bubbles.fa  -o output21-1.fa 149-6_1_val_1.fq.gz 149-6_2_val_2.fq.gz
ABySS 2.0.2
ABYSS -k21 -q3 --coverage-hist=coverage.hist -s output21-bubbles.fa -o output21-1.fa 149-6_1_val_1.fq.gz 149-6_2_val_2.fq.gz
Reading `149-6_1_val_1.fq.gz'...
sparsehash FATAL ERROR: failed to allocate 10 groups
/usr/bin/abyss-pe:506: recipe for target 'output21-1.fa' failed
make: *** [output21-1.fa] Error 1

However, again, it ran successfully with a small synthetic data set from this page (ftp://ccb.jhu.edu/pub/dpuiu/Docs/ABYSS.html).

Has it anything to do with RAM? How can I resolve this problem?

Thank you

Putita

assembly software error genome • 1.9k views

ADD COMMENT • link updated 4.7 years ago by User 59 13k • written 4.7 years ago by putita.jira ▴ 10

0

Entering edit mode

I am not experienced with genome assemblies so the more experienced folks will tell you for sure, but 16GB is pretty much nothing for many bioinformatics tasks. From what I read you need hundreds of GB for de novo assemblies. I would start checking if and from where you can get a cluster/service/node with that amount of memory.

ADD REPLY • link 4.7 years ago by ATpoint 82k

0

Entering edit mode

I agree. Boost it up to at least 64GB RAM.

ADD REPLY • link 4.7 years ago by Kevin Blighe 87k

1

Entering edit mode

I increased RAM and it works!

Thank you ATpoint and Kevin Blighe for your suggestion :)

ADD REPLY • link 4.7 years ago by putita.jira ▴ 10

0

Entering edit mode

Can I know how many memory at the end of your job used?

ADD REPLY • link 4.6 years ago by tobytaogf • 0

score 4 · Accepted Answer · 2019-08-05

4

Entering edit mode

4.7 years ago

User 59 13k

[1228.465533] 139366234 sequences found
Killed

You have run out of memory. The process is being killed by OOM Killer, and you can probably see this in your syslog. So yes, you need more RAM.

ADD COMMENT • link 4.7 years ago by User 59 13k

0

Entering edit mode

Thank you Daniel Swan

Now I run it successfully with more RAM :)

ADD REPLY • link 4.7 years ago by putita.jira ▴ 10