How come that the Trinity assembler seems to work on a computer with low memory
2
0
Entering edit mode
8.5 years ago

Hi Dear All

I have a dataset (about 60m reads paired or 120m together~ 14.5Gb each file). at the first I run trinity (latest version) on my dataset with below code on my pc with 16G Ram, 8 Core CPU :

Trinity --seqType fq --JM 10G --left reads_1.fq --right reads_2.fq --CPU 8


the run failed (because RAM deficiency).

after search I found that I have to set CPU with butterfly dependency RAM (bflyHeapSpaceMax) then I change my code as follow and the all of process were OK and trinity.fasta file was created. All of results and statistics is OK (for example mapped back reads to trinity.fasta)

Trinity --seqType fq --bflyHeapSpaceMax 4G --JM 10G --left reads_1.fq --right reads_2.fq --CPU 4


and now I am really confused. because I thought that I should to run my dataset on a server with 100G RAM and ...

Best

RNA-Seq Assembly next-gen • 5.8k views
2
Entering edit mode

"why didn't the tool fail" is an unusual type of question for this site :-)

0
Entering edit mode

My question is:

Is it usual by a PC (16G RAM and 8 core) we could run trinity on 60M reads to de novo transcriptome assembly? like my situation. Is my results OK?

0
Entering edit mode

Are you sure that you got all succeeded and no fails, how much time did it take?

50 million reads took around 3-4 days using 4 cores and 15G memory.

If All commands of trinity has run successfully (have a look), then I think that you have proper fasta in your output.

low RAM should just delay.

0
Entering edit mode

Thanks so much for your reply. I am sure. during running I didn't any error. I insert trinity.timing below:

Statistics:
===========
Trinity Version:      trinityrnaseq_r20140717
Compiler:             GCC
Trinity Parameters:   --bflyHeapSpaceMax 4G --seqType fq --JM 10G --left /home/mrb/NGS/A-Project/Dr.Hosseinpour/Trimmed-data/forward.fq --right /home/mrb/NGS/A-Project/Dr.Hosseinpour/Trimmed-data/reverse.fq --CPU 4
Paired mode
Input data
Left.fasta    8308 MByte
Right.fasta   8302 MByte
Number of unique KMERs: 204827561
Number of reads:        0 Output data
Trinity.fasta 65 MByte

Runtime
=======
Start:       Mon Sep 29 12:45:10 IRST 2014
End:         Mon Sep 29 23:09:51 IRST 2014
Trinity   37481 seconds
Inchworm   1230 seconds
Chrysalis  22880 seconds
Butterfly  12326 seconds
Rest       1045 seconds


I think that new version of trinity is improved!! is it possible. I really confused. I blasted some contig and that is ok !!!!!

is it a problem????

Best

0
Entering edit mode

Are you getting something like

Invalid maximum heap size: XXXXX


If not, then it should be okay. Aren't you getting something like

succeeded(10346)   28.7749% completed.
succeeded(10347)   28.7776% completed.
succeeded(10348)   28.7804% completed.
succeeded(10349)   28.7832% completed.
succeeded(10350)   28.786% completed.


??

0
Entering edit mode

Thanks again for your reply. for the first time that i ran trinity (with CPU 8) I get this error and trinity didnt converged and trinity.fasta didnt create.

but after I changed my code (--bflyHeapSpaceMax 4G --CPU 4) I didnt get any error and in Butterfly step it ran ok (All succeeded(45600) 100% completed in one step). I know that CPU*bflyHeapSpaceMax must be equal your RAM or lesser.

But I dont know with decreasing bflyHeapSpaceMax, quality of my assembly is changed or not.

Now I am really confused. how it is possible with my 16G Ram PC!!!! OR really trinity is improved.

Thanks

0
Entering edit mode

May be its improved, but I would suggest to post on their site. Can't afford to take chance

its here

http://sourceforge.net/projects/trinityrnaseq/

4
Entering edit mode
8.5 years ago
rtliu ★ 2.2k

Your result is ok. Over the past few years, Trinity has made huge progress in reducing the RAM requirement and speeding up the processing time by better utilizing multiple cores of multiple CPUs. You may further check N50 of Trinity.fasta with the following command:

\$TRINITY_HOME/util/TrinityStats.pl trinity_out_dir/Trinity.fasta


I will be happy if N50 = 1000bp ~ 2000bp.

0
Entering edit mode

At the first after finishing the analysis I checked it. N50 is ~ 1600 bp and range of contig length is 201 - 7500 bp.

I already said all of things is OK. but I didn't know trinity is improved as we can run it on a PC like mine. That is really interesting.

Up to now I graved my project because I doubt on results. So can I continue with this results?

Best

0
Entering edit mode

another question:

Has decreasing bflyHeapSpaceMax (10G to 4G) any effect on assembly quality or not?

Thanks so much

0
Entering edit mode

No. 10G was used for very few extremely complex components.

0
Entering edit mode
8.5 years ago

If that data is consistent you can get very good assemblies in general. I have assembled bacterial genomes on a MacBook Air during a lecture. But then the reads were generated artificially with very even coverage and low errors.

So in all I would say that if the tool finished running with no errors and the results seem sensible then you should be fine.