Question: Unicycler - hybrid assembly failure
0
gravatar for piotr.majewski
12 months ago by
piotr.majewski0 wrote:

Dear All,

I've recently encountered some issues with Unicycler assembly. I've tried to perform hybrid assembly with use of

1) trimmed Illumina reads (R1+R2); format: fastqsanger.gz

2) nanopore reads; format: fasqsanger

Unicycler readily deals with individual assembly of either Illumina or Nanopore reads. However, it fails to generate hybrid assembly. Any suggestions?

thanks in advance,

Piotr

PS here is the error report

tput: No value for $TERM and no -T specified 

tput: No value for $TERM and no -T specified 

tput: No value for $TERM and no -T specified

/pylon5/mc48nsp/xcgalaxy/main/staging/23588931/command.sh: line 95:
38467 Segmentation fault      

(core dumped) unicycler -t
"${GALAXY_SLOTS:-4}" -o ./ --verbosity 3 --pilon_path $pilon -1'fq1.fastq.gz' -2 'fq2.fastq.gz' -l lr.fastq --mode 'conservative' --min_fasta_length '100' --linear_seqs '0' --min_kmer_frac '0.2' --max_kmer_frac '0.95' --kmer_count '10' --depth_filter '0.25' --start_gene_id '90.0' --start_gene_cov '95.0' --min_polish_size '1000' --min_component_size '1000' --min_dead_end_size '1000' --scores '3,-6,-5,-2'
ADD COMMENTlink modified 12 months ago by Joe16k • written 12 months ago by piotr.majewski0

How much memory have you got available?

ADD REPLYlink written 12 months ago by Joe16k

I am currently using 46.5 GB out of total 250.0 GB space.

ADD REPLYlink written 12 months ago by piotr.majewski0
1

By memory, I mean RAM, not disk storage.

ADD REPLYlink written 12 months ago by Joe16k

I've forgot to mention that I am running analyses on Galaxy server.

16GB RAM will be enough to run it offline?

ADD REPLYlink written 12 months ago by piotr.majewski0
1

How big are the files, and what size genome are you expecting?

A seg fault suggests you perhaps don’t have enough memory for doing the hybrid assembly, but it works with the 2 datasets on their own as less memory is required. I would be surprised if 16GB is sufficient, but it’s entirely genome/data dependent.

ADD REPLYlink written 12 months ago by Joe16k

I am expecting genome somewhere around 5 Mb.

In case of input files, nanopore data is quite extensive

1) long reads - 2.3 Gb

2) short reads R1 - 0.17 Gb

3) short reads R2 - 0.16 Gb

ADD REPLYlink modified 12 months ago • written 12 months ago by piotr.majewski0

I suspect that may be too much data for your local machine. I don’t know what a typical Galaxy RAM allowance is. Presumably it’s dependent on the hosting server.

It might be interesting to try and randomly downsample the reads to see if you can reach a point where it runs, assuming it’s not some other issue.

Alternatively there are assembly + polishing workflows you could try, where you assemble the nanopore data first and then error correct with illumina. This might reduce the burden of having too much data being processed at once.

ADD REPLYlink written 12 months ago by Joe16k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1245 users visited in the last hour