Question: ABySS error: sequence and quality must be the same length near ...
0
gravatar for jozs2019
4 months ago by
jozs20190
jozs20190 wrote:

Hello!

I'm trying to use ABySS 1.9.0 to assemble set of genomic paired end reads. I'm doing this on my university's HPC with parallel processing. In my PBS script, I use:

abyss-pe name=abyss_test1 k=63 in='SRR960028_1.fastq SRR960028_2.fastq' v=-v

Unfortunately, ABySS terminates quite quickly after commencing assembly, and the error message I get is:

SRR960028_1.fastq:745672: error: sequence and quality must be the same length near TGGGGACGGCAAGTATCACAGGTGACCCACTCACTGTTTCACCTCTCACCCTAATATGACCGTGTCTACAAGAAGTCAGTCAGCTGTTTCTGTTCCCCAGTGAGAGAGCAA$ CCCFFFFFHHHHHGHIIIIIIIDHHIIIIIIIIIIIIIGIEIIIIIIII make: * [abyss_test1-1.fa] Error 1

When I open the other file, I get:

/usr/local/openmpi/1.8.4-gcc/bin/mpirun -np 4 ABYSS-P -k63 -q3 -v --coverage-hist=coverage.hist -s $ ABySS 1.9.0 ABYSS-P -k63 -q3 -v --coverage-hist=coverage.hist -s abyss_test1-bubbles.fa -o abyss_test1-1.fa SRR96$ Running on 4 processors 0: Running on host hpc088 1: Running on host hpc088

2: Running on host hpc088

3: Running on host hpc088

0: Reading 'SRR960028_1.fastq'...

1: Reading 'SRR960028_2.fastq'...

1: Read 100000 reads. 1: Hash load: 3609195 / 268435456 = 0.0134 using 447 MB

0: Read 100000 reads. 0: Hash load: 3898410 / 268435456 = 0.0145 using 469 MB

1: Read 200000 reads. 1: Hash load: 7004611 / 268435456 = 0.0261 using 646 MB

Primary job terminated normally, but 1 process returned a non-zero exit code.. Per user-direction, the job has been aborted. mpirun detected that one or more processes exited with non-zero status, thus causing the job to be terminated. The first process to do so was:

Process name: [[12781,1],0] Exit code: 1

Does anyone know what this error means and how I can fix it?

abyss error assembly genome • 197 views
ADD COMMENTlink modified 4 months ago by benv500 • written 4 months ago by jozs20190
1
gravatar for benv
4 months ago by
benv500
Canada
benv500 wrote:

FASTQ records consist of 4 lines each. The first line is the header and contains the read ID. The second line contains the sequence. The third line is just a "+". The fourth line is the quality score string.

Each character in the quality score string encodes quality score for the corresponding base in the sequence string. Thus the length of quality score line should be exactly the same length as the sequence line. For one of your FASTQ records the lines have different lengths, and ABySS is telling you the approximate line in SRR960028_1.fastq where the problem record is located (line 745672).

I would recommend first manually looking at the record in question (less is a good tool for that.) Then you will need to either figure out what upstream processing step caused the line lengths to be different or write a unix script to fix the FASTQ file such that the line lengths are always the same (e.g. sed, awk, perl, python).

ADD COMMENTlink written 4 months ago by benv500

I had a look at the file, and as you said, line 745672 was truncated in length. It was also the final line of a fastq file that should have had more lines in it. I have the feeling it was a download problem rather than anything else, it's happened in the past that my fastq files have been truncated during downloading (I use fastq-dump -I --split-files <filename> to download, this should be fine though) ... goes to show I should be checking the heads and tails of my files each time I download! I'll also double check the pair of this file and see if it has the same problem.

Thanks for your answer - it was much appreciated!

ADD REPLYlink written 4 months ago by jozs20190
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 651 users visited in the last hour