Question: Why can't I perform large genome assembly by Velvet and ABySS?
0
gravatar for putita.jira
7 weeks ago by
putita.jira10
putita.jira10 wrote:

Hi everyone! I'm new in bioinformatics field and I have some problems with de novo assembly. So I would like to ask for suggestions.

General information about my work are...

  • Illumina paired-end, read length 150 bp
  • whole genome sequencing (estimate genome size is 224 million bp)
  • 100x coverage
  • number of reads is around 70 million reads per files (forward and reverse)
  • I used Amazon Web Service EC2, instance type M4xlarge (vCPUs = 4, RAM = 16 GiB) to perform all of the following processes.

After I trimmed reads, I tried to assemble with 2 programs: Velvet and ABySS, but both didn't work.

In case of Velvet, I ran velveth with this code.

velveth /home/ubuntu/velvet21 21 -shortPaired -separate -fastq.gz /home/ubuntu//149-6_1_val_1.fq.gz /home/ubuntu//149-6_2_val_2.fq.gz

and got results like this

[0.000001] Reading FastQ file /home/ubuntu/149-6_1_val_1.fq.gz;
[0.002344] Reading FastQ file /home/ubuntu/149-6_2_val_2.fq.gz;
[924.933978] 139366234 sequences found in total in the paired sequence files
[924.933995] Done
[924.983130] Reading read set file /home/ubuntu/velvet21/Sequences;
[1228.465533] 139366234 sequences found
Killed

However, I tried with much smaller genome (4.8 million bp, 1.4 million read each file) and it worked!

In case of ABySS, I performed with this code.

abyss-pe k=21 name=abyss21 in='149-6_1_val_1.fq.gz 149-6_2_val_2.fq.gz'

The result came up like this...

 ABYSS -k21 -q3    --coverage-hist=coverage.hist -s output21-bubbles.fa  -o output21-1.fa 149-6_1_val_1.fq.gz 149-6_2_val_2.fq.gz
ABySS 2.0.2
ABYSS -k21 -q3 --coverage-hist=coverage.hist -s output21-bubbles.fa -o output21-1.fa 149-6_1_val_1.fq.gz 149-6_2_val_2.fq.gz
Reading `149-6_1_val_1.fq.gz'...
sparsehash FATAL ERROR: failed to allocate 10 groups
/usr/bin/abyss-pe:506: recipe for target 'output21-1.fa' failed
make: *** [output21-1.fa] Error 1

However, again, it ran successfully with a small synthetic data set from this page (ftp://ccb.jhu.edu/pub/dpuiu/Docs/ABYSS.html).

Has it anything to do with RAM? How can I resolve this problem?

Thank you

Putita

ADD COMMENTlink modified 6 weeks ago by Daniel Swan13k • written 7 weeks ago by putita.jira10

I am not experienced with genome assemblies so the more experienced folks will tell you for sure, but 16GB is pretty much nothing for many bioinformatics tasks. From what I read you need hundreds of GB for de novo assemblies. I would start checking if and from where you can get a cluster/service/node with that amount of memory.

ADD REPLYlink written 7 weeks ago by ATpoint23k

I agree. Boost it up to at least 64GB RAM.

ADD REPLYlink written 7 weeks ago by Kevin Blighe48k
1

I increased RAM and it works!

Thank you ATpoint and Kevin Blighe for your suggestion :)

ADD REPLYlink written 4 weeks ago by putita.jira10

Can I know how many memory at the end of your job used?

ADD REPLYlink written 3 days ago by tobytaogf0
4
gravatar for Daniel Swan
6 weeks ago by
Daniel Swan13k
Aberdeen, UK
Daniel Swan13k wrote:
[1228.465533] 139366234 sequences found
Killed

You have run out of memory. The process is being killed by OOM Killer, and you can probably see this in your syslog. So yes, you need more RAM.

ADD COMMENTlink written 6 weeks ago by Daniel Swan13k

Thank you Daniel Swan

Now I run it successfully with more RAM :)

ADD REPLYlink written 4 weeks ago by putita.jira10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1165 users visited in the last hour