Question: Problem with Bio.SeqIO
gravatar for haasroni
8 weeks ago by
haasroni0 wrote:

Hi! I am having an unexplained problem with the code I wrote, using SeqIO of Biopython. I am doing several filtering steps for a fastq file using this code:

def extract_from_fastq(fq, output_fq):
    Takes a fastq file, examines each read using all the above functions, and writes to a 
    new file the non-ambiguous reads
    :param fq: the fastq file
    :param output_fq: the output fastq file after filtering
    input_iterator = SeqIO.parse(fq, "fastq")
    #goes over each record and tests if the read meets the requirements
    short_iterator = (rec for rec in input_iterator if filter_by_quality(rec.letter_annotations["phred_quality"]) \
        and filter_by_single_nucleotide_appearance(rec.seq) and filter_by_long_stretches_repeats(rec.seq))
    #writes to a new file after the conversion to a fastq format again
    SeqIO.write(short_iterator, output_fq, "fastq")

The problem is that the created file sometimes includes only the last record (the last 4 lines of the input fastq), so I assume it is overwritten in each iteration. However, sometimes it does work and I get all records in one file!

Any idea why is this and how to avoid it?

Thank you!!

sequence • 153 views
ADD COMMENTlink modified 8 weeks ago • written 8 weeks ago by haasroni0

Looks okay to me. Is output_fq an output filename or output file handle? Are you sure the SeqIO.write(..) is called only once on a single file (output_fq) in your code (it should)?

ADD REPLYlink written 8 weeks ago by a.zielezinski9.6k

Yes, the output_fq is an output filename and I do call only once the SeqIO.write(..). Thank you for your help!

ADD REPLYlink written 7 weeks ago by haasroni0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1108 users visited in the last hour