Masurca Error With Illumina Assembly
4
1
Entering edit mode
10.9 years ago
Raygozak ★ 1.4k

Hi i'm new to MaSuRCA and got this error while trying to do my first assembly, below is the config file and the output from MaSuRCA. thanks

Thanks a lot

processing PE library reads Wed May 29 16:53:40 EDT 2013

Average PE read length 251

choosing kmer size of 175 for the graph

running Jellyfish Wed May 29 16:54:06 EDT 2013

MIN_Q_CHAR: 33

Error correction Poisson cutoff = 5

error correct PE Wed May 29 17:24:40 EDT 2013

terminate called after throwing an instance of 'jellyfish::file_parser::FileParserError'

what(): Empty input file 'pe.cor.fa'

./assemble.sh: line 58: 610 Aborted jellyfish count -p 126 -m 31 -t 12 -C -s $JF_SIZE -o k_u pe.cor.fa

ln: creating symbolic link k_u_hash_0' to k_u_0': File exists

terminate called after throwing an instance of 'mapped_file::ErrorMMap'

what(): Can't open file k_u_hash_0:

Estimated genome size:

Invalid uint64_t '-l' for [-n, --nb-mers=uint64]: Negative value

computing super reads from PE Wed May 29 17:29:02 EDT 2013

Super reads failed, check super1.err and files in ./work1/

config.txt:

PATHS

JELLYFISH_PATH=/gpfs/home/jzr186/work/tools/bin/

SR_PATH=/gpfs/home/jzr186/work/tools/bin/

CA_PATH=/gpfs/home/jzr186/work/tools/CA/Linux-amd64/bin

END

DATA

PE= pe 300 20 /gpfs/home/jzr186/scratch/CAMP/CAMP18/JH_R1_001.fastq /gpfs/home/jzr186/scratch/CAMP/CAMP18/JH_R2_001.fastq

END

PARAMETERS

GRAPH_KMER_SIZE=auto

USE_LINKING_MATES=1

JF_SIZE=1800000000

DO_HOMOPOLYMER_TRIM=0

NUM_THREADS=12

END

denovo illumina • 7.9k views
ADD COMMENT
1
Entering edit mode
10.9 years ago
rtliu ★ 2.2k

I would suggest you start your first MaSuRCA run with the test data from MaSuRCA ftp site

ftp://ftp.genome.umd.edu/pub/MaSuRCA/test_data/rhodobacter/

PE data only, then add SJ, Sanger data.

Then double-check your input data. e.g. FastqPairedEndValidator.pl

Good luck!

Update 27-07-2013

MaSuRCA finally released the config file for test data rhodobacter

ftp://ftp.genome.umd.edu/pub/MaSuRCA/test_data/rhodobacter/sr_config_Illumina_Sanger_1x.txt

ADD COMMENT
0
Entering edit mode

I have a follow up question on this: how do I add libraries progressively? you mentioned here that you can add PE data only for the first round and then add SJ data.. how do I do this exactly? do mean multiple rounds or am I missing out something obvious?

ADD REPLY
1
Entering edit mode
10.1 years ago
sutturka ▴ 190

Hi,

I contacted developers regarding this and they suggested that read_names does not matter during pre-processing of data. He suggested me to perform a test with my fastq file:

> file -b -i jumps.A.fastq

This gave me the results like:

text/x-python; charset=us-ascii

I emailed results to developers and they suggested that - the operating system thinks that your fastq file is a python code. This is not correct. The type should be text/plain.

The simple way to fix this:

Look at expand_fastq script under masurca bin folder and replace the line:

    (text/plain*)
with
    (text/*)

everything should work afterward.

After this change, I was able to run the assembler correctly with setting JF_SIZE=1800000000 value very high.

Thanks Sagar

ADD COMMENT
0
Entering edit mode
10.8 years ago
jc.szamosi ▴ 50

I've been having the same problem. The test data doesn't help. My data is PE only, the paired ends have been checked, and the error happens for some genomes but not others, with no apparent pattern of read length, GC content, or anything else I can think of.

ADD COMMENT
0
Entering edit mode
10.6 years ago

Me too, and it is so annoying!!! I thought it was a memory (RAM) problem but then I tried to re run with some libraries that worked well in past and same error.

Anyone with a solution?

Luis

ADD COMMENT

Login before adding your answer.

Traffic: 3221 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6