Question

ValueError("End of file without quality information.") when using SeqIO

1

Entering edit mode

4.9 years ago

Arko ▴ 30

Hi! I'm trying to read in a fastQ file and carry out some operations on it (Demultiplexing the file into various fastq files based on exact matching barcodes) Midway through the fastQ file, I'm getting the following error :

ValueError: End of file without quality information.

try:
for name, seq, qual in FastqGeneralIterator(open(input_file)):
    key = name + '\n' + seq + '\n' + qual
    if key not in unmap.keys():
        unmap[key]=False
    header = name.split(":")[4]
    end_bc = header.split("#")[1]
    seq_barcode = end_bc.split("/")[0][10:]
    if seq_barcode in barcode:
        count = count + 1
        f.write("@{}\n{}\n+\n{}\n".format(name,seq,qual))

I have no idea on why it's failing, I don't think there's any issue with the fastQ file but if there was, I have no idea on how to check since the file is massive and the error doesn't specify where it fails.

This is the exact traceback :

>     for name, seq, qual in FastqGeneralIterator(open(input_file)):
  File "/home/software/python/python-3.6.4/lib/python3.6/site-packages/Bio/SeqIO/QualityIO.py", line 914, in FastqGeneralIterator
    raise ValueError("End of file without quality information.")
ValueError: End of file without quality information

biopython fastq FastqGeneralIterator SeqIO python • 2.1k views

ADD COMMENT • link updated 4.9 years ago by Ram 43k • written 4.9 years ago by Arko ▴ 30

0

Entering edit mode

What is the output of:

tail <fastq_file>

(where <fastq_file> is the file being read by the python script)

ADD REPLY • link 4.9 years ago by Ram 43k

0

Entering edit mode

6AAAAEEEEEEEEEEEEEEEE////A
@NS500496_727_H373LBGXB:2:23203:12170:5855#AGAATAAAAGAGTGAT/1
GAGGAAGTTCCAGCCAAGGAGATTGA
+NS500496_727_H373LBGXB:2:23203:12170:5855#AGAATAAAAGAGTGAT/1
AAAAAEEAEE/AEEAAEEAEE//E/<
@NS500496_727_H373LBGXB:2:23203:1526:5855#ACATCATCCCGAGTGG/1
CAGAAACACCAGGATCCCATGATTGA
+NS500496_727_H373LBGXB:2:23203:1526:5855#ACATCATCCCGAGTGG/1
AAAAAEEEEEEEEEEEEAEEE///EE

ADD REPLY • link updated 4.9 years ago by Ram 43k • written 4.9 years ago by Arko ▴ 30

0

Entering edit mode

Please use the formatting bar (especially the code option) to present your post better. You can use backticks for inline code (`text` becomes text), or select a chunk of text and use the highlighted button to format it as a code block. I've done it for you this time.
code_formatting

ADD REPLY • link 4.9 years ago by Ram 43k

0

Entering edit mode

if key not in unmap.keys():

You'll instead want if key not in unmap:, which is O(1) rather than O(n) in terms of performance.

ADD REPLY • link 4.9 years ago by Devon Ryan 104k

0

Entering edit mode

Thanks for the tip!

ADD REPLY • link 4.9 years ago by Arko ▴ 30

score 2 · Answer 1 · 2019-05-29

2

Entering edit mode

4.9 years ago

Joe 21k

do you really need to get your hands dirty in the 'bowels' of BioPython like that?

Why not just use SeqIO.parse() directly as an iterator?

Since you're iterating your file, you can just throw some print statements in to print the sequence name/header and find out how far the iteration got before it choked on the quality line. Then go Ctrl-F/grep that header in the file and checkout the neighbourhood for anything funky going on.

ADD COMMENT • link 4.9 years ago by Joe 21k

0

Entering edit mode

Good idea, I shall have a look at it!

ADD REPLY • link 4.9 years ago by Arko ▴ 30

1

Entering edit mode

It should be sufficient to do something like:

from Bio import SeqIO

for record in SeqIO.parse('/path/to/input.fastq', 'fastq'):
    # do stuff with the record object e.g.
    # Description is safer for long headers, but as there's no whitespace, .id and .description is actually the same
    barcode = record.description[record.description.index('#')+1:-2]
etc.
    qualities = record.letter_annotations.values()

etc.

ADD REPLY • link 4.9 years ago by Joe 21k