Question: ValueError("End of file without quality information.") when using SeqIO
1
gravatar for Arko
17 months ago by
Arko30
US/Boston/Boston University
Arko30 wrote:

Hi! I'm trying to read in a fastQ file and carry out some operations on it (Demultiplexing the file into various fastq files based on exact matching barcodes) Midway through the fastQ file, I'm getting the following error :

ValueError: End of file without quality information.

 

try:
for name, seq, qual in FastqGeneralIterator(open(input_file)):
    key = name + '\n' + seq + '\n' + qual
    if key not in unmap.keys():
        unmap[key]=False
    header = name.split(":")[4]
    end_bc = header.split("#")[1]
    seq_barcode = end_bc.split("/")[0][10:]
    if seq_barcode in barcode:
        count = count + 1
        f.write("@{}\n{}\n+\n{}\n".format(name,seq,qual))

I have no idea on why it's failing, I don't think there's any issue with the fastQ file but if there was, I have no idea on how to check since the file is massive and the error doesn't specify where it fails.

This is the exact traceback :

>     for name, seq, qual in FastqGeneralIterator(open(input_file)):
  File "/home/software/python/python-3.6.4/lib/python3.6/site-packages/Bio/SeqIO/QualityIO.py", line 914, in FastqGeneralIterator
    raise ValueError("End of file without quality information.")
ValueError: End of file without quality information
ADD COMMENTlink modified 17 months ago by RamRS30k • written 17 months ago by Arko30

What is the output of:

tail <fastq_file>

(where <fastq_file> is the file being read by the python script)

ADD REPLYlink written 17 months ago by RamRS30k
6AAAAEEEEEEEEEEEEEEEE////A
@NS500496_727_H373LBGXB:2:23203:12170:5855#AGAATAAAAGAGTGAT/1
GAGGAAGTTCCAGCCAAGGAGATTGA
+NS500496_727_H373LBGXB:2:23203:12170:5855#AGAATAAAAGAGTGAT/1
AAAAAEEAEE/AEEAAEEAEE//E/<
@NS500496_727_H373LBGXB:2:23203:1526:5855#ACATCATCCCGAGTGG/1
CAGAAACACCAGGATCCCATGATTGA
+NS500496_727_H373LBGXB:2:23203:1526:5855#ACATCATCCCGAGTGG/1
AAAAAEEEEEEEEEEEEAEEE///EE
ADD REPLYlink modified 17 months ago by RamRS30k • written 17 months ago by Arko30

Please use the formatting bar (especially the code option) to present your post better. You can use backticks for inline code (`text` becomes text), or select a chunk of text and use the highlighted button to format it as a code block. I've done it for you this time.
code_formatting

ADD REPLYlink written 17 months ago by RamRS30k
if key not in unmap.keys():

You'll instead want if key not in unmap:, which is O(1) rather than O(n) in terms of performance.

ADD REPLYlink written 17 months ago by Devon Ryan97k

Thanks for the tip!

ADD REPLYlink written 17 months ago by Arko30
2
gravatar for Joe
17 months ago by
Joe18k
United Kingdom
Joe18k wrote:

do you really need to get your hands dirty in the 'bowels' of BioPython like that?

Why not just use SeqIO.parse() directly as an iterator?

Since you're iterating your file, you can just throw some print statements in to print the sequence name/header and find out how far the iteration got before it choked on the quality line. Then go Ctrl-F/grep that header in the file and checkout the neighbourhood for anything funky going on.

ADD COMMENTlink modified 17 months ago • written 17 months ago by Joe18k

Good idea, I shall have a look at it!

ADD REPLYlink written 17 months ago by Arko30
1

It should be sufficient to do something like:

from Bio import SeqIO

for record in SeqIO.parse('/path/to/input.fastq', 'fastq'):
    # do stuff with the record object e.g.
    # Description is safer for long headers, but as there's no whitespace, .id and .description is actually the same
    barcode = record.description[record.description.index('#')+1:-2]
etc.
    qualities = record.letter_annotations.values()

etc.

ADD REPLYlink modified 17 months ago • written 17 months ago by Joe18k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1006 users visited in the last hour