Forum: Careful!! Velvet Generates Different Assemblies From The Same Input Data And Same Input Parameters With Openmp Enabled.
1
gravatar for Rahul Sharma
5.4 years ago by
Rahul Sharma560
Germany
Rahul Sharma560 wrote:

Hi all,

I am using four different libraries of read size 76bps, insert sizes are 300bps, 1kb, 8kb and 12kb. Expected genome size is 80MB.

I am running velvet using these four libraries. Actually I tried velvet first for different k-mers and then for the best k-mer I tried different cov_cutoffs. In all these assemblies from same k-mer (69) with different cov_cutoffs, I used the same Roadmap and Sequences files from the initial velveth run (K-mer 69 and cov_cutoff default).

Surprisingly I got 10MB of N50 and 23MB of largest scaffold size using cov_cutoff of 12 (median coverage is 30.76 in Log file) on the previously generated Roadmap and Sequences files. Then later I tried all new assembly from the same reads, K-mer 69 and cov_cutoff 12, now my N50 is 2MB and largest scaffold size is 6.78 MB.

Later I tried the same input files and same velvet parameters and I figured out that velveth is generating different Roadmap file for the same k-mer 69, for all three runs. What could be the reason behind this? In this case it is not possible to regenerate the results.

I would really appreciate your comments on this.

Best regards, Rahul

genome forum velvet • 6.0k views
ADD COMMENTlink modified 5.4 years ago by SES8.1k • written 5.4 years ago by Rahul Sharma560
3
gravatar for SES
5.4 years ago by
SES8.1k
Vancouver, BC
SES8.1k wrote:

The solution is to set the OMP thread environment in your shell script:

#!/bin/bash

export OMP_THREAD_LIMIT=1
export OMP_NUM_THREADS=1

velveth ....
velvetg .

This will ensure the same results, but it will mean that velveth will likely take longer to execute. For finding the optimal kmer and cutoff, I recommend using VelvetOptimser. It is especially important to use these OMP settings if you are using VelvetOptimiser, as every single thread will try to use all the processors on that node if you don't have these variables set appropriately.

ADD COMMENTlink written 5.4 years ago by SES8.1k

Thanks for this, but then velvet takes ages to generate assemblies on a single processor. Regarding VelvetOptimiser, I cannot use it in my analysis. Actually I have huge dataset and it consumes ~95% of RAM of our whole group's machine. I need to submit jobs accordingly, in this case manual way of optimizing velvet is the right choice.

ADD REPLYlink written 5.4 years ago by Rahul Sharma560

Hi SES,

I am currently trying to setup my submission script for VelvetOptimizer on our cluster and was a bit confused as to how the various parameters had to be setup. The plan was to spread the instances over 24 threads, so I initially had the following parameters:

  • In the SLURM submission script: OMP_NUM_THREADS=24, OPENBLAS_NUM_THREADS=24, --cpus=24, --cpus-per-task=24, mem=256Gb
  • In the VelvetOptimizer command line: -t 24

 

However it appears that this generated thread allocation issues (it maybe tried to allocate 24*24 threads...?), and the program eventually crashed with the following error messages (not sure if the thread allocation actually caused the crash though; if you don't think so, I would appreciate any recommendation  :-) ):

Use of uninitialized value in numeric ne (!=) at /home/umons/orgmb/csheridan/software/BioPerl/BioPerl-1.6.1/VelvetOptimiser.pl line 289.
Use of uninitialized value $maxScore in numeric gt (>) at /home/umons/orgmb/csheridan/software/BioPerl/BioPerl-1.6.1/VelvetOptimiser.pl line 290.
Use of uninitialized value in numeric gt (>) at /home/umons/orgmb/csheridan/software/BioPerl/BioPerl-1.6.1/VelvetOptimiser.pl line 290.
Aug 11 16:12:51 Hash value of best assembly by assembly score: 79
Aug 11 16:12:51 Optimisation routine chosen for best assembly: shortPaired
Aug 11 16:12:51 Looking for the expected coverage
Unable to open /gpfsuser/home/users/c/s/csheridan/data/auto_data_79/stats.txt for exp_cov determination.
 at /home/umons/orgmb/csheridan/software/BioPerl/BioPerl-1.6.1/VelvetOptimiser.pl line 838

 

Anyway, I then tried to correct this by setting the threads parameters as follows, but things appeared much slower with Velvet calculating only 3 hash values at a time:

  • In the SLURM submission script: OMP_NUM_THREADS=8, OPENBLAS_NUM_THREADS=8, --cpus=24, --cpus-per-task=24, mem=256Gb
  • In the VelvetOptimizer command line: -t 3

 

So my question is, how do you exactly set the threads parameters in order to gain from the use of OPENMP, and how can Velvet/Oases be optimally parallelised?

Thank you!

 

 

ADD REPLYlink modified 4.7 years ago • written 4.7 years ago by sheridan.christopher10

it is not a good idea to post a new question in the comment section. On Biostar we like to keep a single topic in thread. In addition very few people see your post so it is also inefficient.

ADD REPLYlink written 4.7 years ago by Istvan Albert ♦♦ 80k
1
gravatar for Istvan Albert
5.4 years ago by
Istvan Albert ♦♦ 80k
University Park, USA
Istvan Albert ♦♦ 80k wrote:

I ran a test and my files are identical:

velveth a31 31 -shortPaired -fastq -separate r1.fq r2.fq
velveth b31 31 -shortPaired -fastq -separate r1.fq r2.fq
cmp a31/Roadmaps b31/Roadmaps

no output returned from cmp. Running the velvetg on each also produces identical results:

velvetg a31 -exp_cov auto
Final graph has 1055 nodes and n50 of 15415, max 65035, total 1446385, using 159599/200002 reads

velvetg b31 -exp_cov auto
Final graph has 1055 nodes and n50 of 15415, max 65035, total 1446385, using 159599/200002 reads
ADD COMMENTlink written 5.4 years ago by Istvan Albert ♦♦ 80k
Hello
i run velvet for transcriptome paired data by using cov_cutoff 4.0 min_contig_lgth 200
Final graph has 251014 nodes and n50 of 562, max 8255, total 32301310, using 0/32799252 reads

what are the nodes 251014 and maxand and toatal numbers indicates in velvet output. i could not able get it.

Please answer me on this.

ADD REPLYlink written 4.3 years ago by reddy.renuka50
1
gravatar for Rahul Sharma
5.4 years ago by
Rahul Sharma560
Germany
Rahul Sharma560 wrote:

Now it has been clear that velvet generates different assemblies because of OPENMP, now what should be considered. Should I go for the 10MB N50 size assemblies? How much this will be realistic? For the publication point of view, our results should be re-generated using same methods. But when I run velvet 5 different times, with same data, k-mer 69 and cov_cutoff 12 it is giving me five different assemblies. Which assembly one should trust then?

I would appreciate your suggestions on this!

Thanks in advance! Rahul

ADD COMMENTlink modified 5.4 years ago • written 5.4 years ago by Rahul Sharma560
1

Well as long as we accept that velvet works and each assembly is valid just that one is more complete than the other I would pick the best assembly that works.

After all this is what people do anyhow when sweeping over the parameter space.

Then describe in the supplementary materials the issue with reproducibility.

ADD REPLYlink modified 5.4 years ago • written 5.4 years ago by Istvan Albert ♦♦ 80k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1523 users visited in the last hour