Question: Masurca Error With Illumina Assembly
1
gravatar for Raygozak
5.7 years ago by
Raygozak1.3k
State College, PA, Penn State
Raygozak1.3k wrote:

Hi i'm new to MaSuRCA and got this error while trying to do my first assembly, below is the config file and the output from MaSuRCA. thanks

Thanks a lot

processing PE library reads Wed May 29 16:53:40 EDT 2013

Average PE read length 251

choosing kmer size of 175 for the graph

running Jellyfish Wed May 29 16:54:06 EDT 2013

MIN_Q_CHAR: 33

Error correction Poisson cutoff = 5

error correct PE Wed May 29 17:24:40 EDT 2013

terminate called after throwing an instance of 'jellyfish::file_parser::FileParserError'

what(): Empty input file 'pe.cor.fa'

./assemble.sh: line 58: 610 Aborted jellyfish count -p 126 -m 31 -t 12 -C -s $JF_SIZE -o k_u pe.cor.fa

ln: creating symbolic link k_u_hash_0' to k_u_0': File exists

terminate called after throwing an instance of 'mapped_file::ErrorMMap'

what(): Can't open file k_u_hash_0:

Estimated genome size:

Invalid uint64_t '-l' for [-n, --nb-mers=uint64]: Negative value

computing super reads from PE Wed May 29 17:29:02 EDT 2013

Super reads failed, check super1.err and files in ./work1/

config.txt:

PATHS

JELLYFISH_PATH=/gpfs/home/jzr186/work/tools/bin/

SR_PATH=/gpfs/home/jzr186/work/tools/bin/

CA_PATH=/gpfs/home/jzr186/work/tools/CA/Linux-amd64/bin

END

DATA

PE= pe 300 20 /gpfs/home/jzr186/scratch/CAMP/CAMP18/JH_R1_001.fastq /gpfs/home/jzr186/scratch/CAMP/CAMP18/JH_R2_001.fastq

END

PARAMETERS

GRAPH_KMER_SIZE=auto

USE_LINKING_MATES=1

JF_SIZE=1800000000

DO_HOMOPOLYMER_TRIM=0

NUM_THREADS=12

END

illumina denovo • 5.3k views
ADD COMMENTlink modified 4.9 years ago by sutturka120 • written 5.7 years ago by Raygozak1.3k
1
gravatar for rtliu
5.7 years ago by
rtliu2.0k
New Zealand
rtliu2.0k wrote:

I would suggest you start your first MaSuRCA run with the test data from MaSuRCA ftp site

ftp://ftp.genome.umd.edu/pub/MaSuRCA/test_data/rhodobacter/

PE data only, then add SJ, Sanger data.

Then double-check your input data. e.g. FastqPairedEndValidator.pl

Good luck!

Update 27-07-2013

MaSuRCA finally released the config file for test data rhodobacter

ftp://ftp.genome.umd.edu/pub/MaSuRCA/test_data/rhodobacter/sr_config_Illumina_Sanger_1x.txt

ADD COMMENTlink modified 5.6 years ago • written 5.7 years ago by rtliu2.0k

I have a follow up question on this: how do I add libraries progressively? you mentioned here that you can add PE data only for the first round and then add SJ data.. how do I do this exactly? do mean multiple rounds or am I missing out something obvious?

ADD REPLYlink written 4.6 years ago by arnstrm1.7k
0
gravatar for jc.szamosi
5.6 years ago by
jc.szamosi40
Canada
jc.szamosi40 wrote:

I've been having the same problem. The test data doesn't help. My data is PE only, the paired ends have been checked, and the error happens for some genomes but not others, with no apparent pattern of read length, GC content, or anything else I can think of.

ADD COMMENTlink written 5.6 years ago by jc.szamosi40
0
gravatar for luisnevescunha
5.4 years ago by
luisnevescunha0 wrote:

Me too, and it is so annoying!!! I thought it was a memory (RAM) problem but then I tried to re run with some libraries that worked well in past and same error.

Anyone with a solution?

Luis

ADD COMMENTlink modified 5.4 years ago • written 5.4 years ago by luisnevescunha0
0
gravatar for sutturka
4.9 years ago by
sutturka120
USA
sutturka120 wrote:

Hi,

I contacted developers regarding this and they suggested that read_names does not matter during pre-processing of data. He suggested me to perform a test with my fastq file:

> file -b -i jumps.A.fastq

This gave me the results like:

text/x-python; charset=us-ascii

I emailed results to developers and they suggested that - the operating system thinks that your fastq file is a python code. This is not correct. The type should be text/plain.

The simple way to fix this:

Look at expand_fastq script under masurca bin folder and replace the line:

    (text/plain*)
with
    (text/*)

everything should work afterward.

After this change, I was able to run the assembler correctly with setting JF_SIZE=1800000000 value very high.

Thanks Sagar

ADD COMMENTlink written 4.9 years ago by sutturka120
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1303 users visited in the last hour