Question: Unicycler - hybrid assembly failure
0
gravatar for piotr.majewski
25 days ago by
piotr.majewski0 wrote:

Dear All,

I've recently encountered some issues with Unicycler assembly. I've tried to perform hybrid assembly with use of

1) trimmed Illumina reads (R1+R2); format: fastqsanger.gz

2) nanopore reads; format: fasqsanger

Unicycler readily deals with individual assembly of either Illumina or Nanopore reads. However, it fails to generate hybrid assembly. Any suggestions?

thanks in advance,

Piotr

PS here is the error report

tput: No value for $TERM and no -T specified 

tput: No value for $TERM and no -T specified 

tput: No value for $TERM and no -T specified

/pylon5/mc48nsp/xcgalaxy/main/staging/23588931/command.sh: line 95:
38467 Segmentation fault      

(core dumped) unicycler -t
"${GALAXY_SLOTS:-4}" -o ./ --verbosity 3 --pilon_path $pilon -1'fq1.fastq.gz' -2 'fq2.fastq.gz' -l lr.fastq --mode 'conservative' --min_fasta_length '100' --linear_seqs '0' --min_kmer_frac '0.2' --max_kmer_frac '0.95' --kmer_count '10' --depth_filter '0.25' --start_gene_id '90.0' --start_gene_cov '95.0' --min_polish_size '1000' --min_component_size '1000' --min_dead_end_size '1000' --scores '3,-6,-5,-2'
ADD COMMENTlink modified 25 days ago by jrj.healey12k • written 25 days ago by piotr.majewski0

How much memory have you got available?

ADD REPLYlink written 25 days ago by jrj.healey12k

I am currently using 46.5 GB out of total 250.0 GB space.

ADD REPLYlink written 25 days ago by piotr.majewski0
1

By memory, I mean RAM, not disk storage.

ADD REPLYlink written 25 days ago by jrj.healey12k

I've forgot to mention that I am running analyses on Galaxy server.

16GB RAM will be enough to run it offline?

ADD REPLYlink written 25 days ago by piotr.majewski0
1

How big are the files, and what size genome are you expecting?

A seg fault suggests you perhaps don’t have enough memory for doing the hybrid assembly, but it works with the 2 datasets on their own as less memory is required. I would be surprised if 16GB is sufficient, but it’s entirely genome/data dependent.

ADD REPLYlink written 25 days ago by jrj.healey12k

I am expecting genome somewhere around 5 Mb.

In case of input files, nanopore data is quite extensive

1) long reads - 2.3 Gb

2) short reads R1 - 0.17 Gb

3) short reads R2 - 0.16 Gb

ADD REPLYlink modified 25 days ago • written 25 days ago by piotr.majewski0

I suspect that may be too much data for your local machine. I don’t know what a typical Galaxy RAM allowance is. Presumably it’s dependent on the hosting server.

It might be interesting to try and randomly downsample the reads to see if you can reach a point where it runs, assuming it’s not some other issue.

Alternatively there are assembly + polishing workflows you could try, where you assemble the nanopore data first and then error correct with illumina. This might reduce the burden of having too much data being processed at once.

ADD REPLYlink written 25 days ago by jrj.healey12k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1460 users visited in the last hour