Question: 10X Genomica Supernova Troubleshooting
0
gravatar for peter_stokes
3 months ago by
peter_stokes10
peter_stokes10 wrote:

Hi all!

I am attempting to perform de novo assembly of sunflower with Supernova 2.0.0.

I am having some difficulty getting it to finish within the wallclock limit for resources I am using. I have a wallclock limit of 48 hours on SDSC Comet (64 cores, 1.4TB memory) and 72 hours on Savio here at UCB (16 cores, 512GB memory).

I typically have not been including --maxreads in my scripts, assuming that will produce the best quality assembly, but this is not realistic considering my wallclock limits. One question I have is whether or not I should limit the number of reads to what the sequencing company has given in their report. Our sequences are from HiSeq X, and it says that the number of reads are 261M reads. Is this "reads" from the sequencing company different than the reads (as in maxreads) for supernova?

Also, do you set localcores and localmem? Or do you just let the program use the resources available on that node?

I should also add that the genome is 3.6G-bases, quite large. I also expect some heterozygosity.

Thanks!

assembly • 367 views
ADD COMMENTlink modified 9 weeks ago • written 3 months ago by peter_stokes10

Also, do you set localcores and localmem? Or do you just let the program use the resources available on that node?

What job scheduler do you use? As I recall 10x supports LSF and SGE. While SLURM is not supported officially it does work. You should allocate resources properly when starting Supernova jobs. 2 d may not be enough time. This thread hassome useful information albeit from an older version of Supernova: 10x Supernova de novo assembly .

ADD REPLYlink modified 3 months ago • written 3 months ago by genomax49k

Hi Genomax,

Thanks for your reply. I am so new to anything computational biology related, as I am a first year rotating graduate student - please bear with me. I have read many other troubleshooting posts on this forum. I am working with SDSC to see if I can incorporate checkpoint restart in my scripts to pick up where I left off since wallclock limits are, well, limiting.

Here is my script:

#!/bin/bash 
#SBATCH -D /oasis/scratch/comet/petersto/temp_project/assemblyOutput
#SBATCH -J supernova
#SBATCH --partition=large-shared
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=64
#SBATCH --mem=1400000
#SBATCH --time=48:00:00
#SBATCH -o /oasis/projects/nsf/ddp319/petersto/err_outs/assemblyFull7506.out
#SBATCH -e /oasis/projects/nsf/ddp319/petersto/err_outs/assemblyFull7506.err
#SBATCH --mail-user=xxxx
#SBATCH --mail-type=All

export PATH=/oasis/projects/nsf/ddp319/petersto/programs/supernova/supernova-2.0.0:$PATH

supernova run --id=assemblyFull7506 \
--fastqs=/oasis/projects/nsf/ddp319/petersto/10X_seqData/ \
--localcores=64 \
--localmem=1400 \
--description="Full Assembly 7506"

Is this not ideal? It has finished on Savio with --maxreads=10M, with extremely poor quality

ADD REPLYlink modified 3 months ago by genomax49k • written 3 months ago by peter_stokes10

@Peter: There is a separate settings file that can be found in supernova/2.0.0/supernova-2.0.0/martian-cs/2.3.1/jobmanagers/ hierarchy. Has that been properly configured on your cluster for the scheduler you are using?

Is the memory specification correct (is that in GB?) and does the #SBATCH value match what you are using on the command line (in previous version this was controlled by the file I mentioned above).

e.g. With 64 cores and 1.4T RAM you should be able to finish this job in two days. Have you prepared the fastqs using supernova mkfastq?

ADD REPLYlink modified 3 months ago • written 3 months ago by genomax49k

Are you sure you have enough reads? The default setting is 1.2B (calculated for the 3.2 Gb human genome @ 57x coverage).

You've only got around 11-fold coverage. Well below the recommended minimum of 38x.

https://support.10xgenomics.com/de-novo-assembly/software/pipelines/latest/using/running

ADD REPLYlink written 3 months ago by Andy0

While the number of reads is less than the recommended first challenge for @peter is to get the software to finish an analysis. While not optimal he should at least get some contigs.

ADD REPLYlink written 3 months ago by genomax49k

Peter.....could you also confirm whether these are Gemcoded reads made from a 10X Chromium library rather than any old Illumina data.

I've recently been testing my system with the aphid genome to see it's up to the task before I apply it to my own data. 261M reads on the system you have ought to complete in well under 1 day.

ADD REPLYlink modified 3 months ago • written 3 months ago by Andy0

@Andy: Since these are not answers for the original question you should post these comments by using ADD COMMENT (original post)/ADD REPLY (other existing comments).

ADD REPLYlink modified 3 months ago • written 3 months ago by genomax49k

Hey did you manage to get check pointing to work? I'm having the exact same problem on supernova on comet

ADD REPLYlink written 12 weeks ago by rcw2710

I think the check pointing is automatic. You just need to restart the job (if I recall right).

ADD REPLYlink modified 12 weeks ago • written 12 weeks ago by genomax49k

So the job timed out and you just kicked it off again? Or you had to ask it to checkpoint? I originally did the latter but it didn't create a checkpoint file

ADD REPLYlink written 12 weeks ago by rcw2710

I believe that is correct (it has been some time since I ran a 10x job). I think it keeps track of where things are. You don't need to create a file.

ADD REPLYlink modified 12 weeks ago • written 12 weeks ago by genomax49k

Awesome! Thanks for the fast responses!

ADD REPLYlink written 12 weeks ago by rcw2710
0
gravatar for peter_stokes
9 weeks ago by
peter_stokes10
peter_stokes10 wrote:

Hey all!

Sorry for the long hiatus!

Turns out, it has an automatic checkpoint (so long as you don't adjust your script!!!!). I was making the mistake of constantly changing my script to increase efficiency. In doing so, upon submitting the job, the scheduler would see that as a new job, and overwrite all files from the previously "failed" (due to timeout) job with that same name.

I ended up with a 25X coverage genome, which I am happy about for what I am using it for! But, it worked!

With 1.45TB Memory and 48 cores, it took about 8-9 days. So still a very very long time. Some steps took the entire 48 hour wall clock limit at SDSC's comet.

Anyways, thanks for all the help and suggestions; if anyone would like to see my scripts for the jobs I submitted on SDSC Comet, I am happy to provide :).

ADD COMMENTlink written 9 weeks ago by peter_stokes10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 852 users visited in the last hour