Question: Problem with Bio.SeqIO
0
gravatar for haasroni
8 weeks ago by
haasroni0
haasroni0 wrote:

Hi! I am having an unexplained problem with the code I wrote, using SeqIO of Biopython. I am doing several filtering steps for a fastq file using this code:

def extract_from_fastq(fq, output_fq):
    """
    Takes a fastq file, examines each read using all the above functions, and writes to a 
    new file the non-ambiguous reads
    :param fq: the fastq file
    :param output_fq: the output fastq file after filtering
    """
    input_iterator = SeqIO.parse(fq, "fastq")
    #goes over each record and tests if the read meets the requirements
    short_iterator = (rec for rec in input_iterator if filter_by_quality(rec.letter_annotations["phred_quality"]) \
        and filter_by_single_nucleotide_appearance(rec.seq) and filter_by_long_stretches_repeats(rec.seq))
    #writes to a new file after the conversion to a fastq format again
    SeqIO.write(short_iterator, output_fq, "fastq")

The problem is that the created file sometimes includes only the last record (the last 4 lines of the input fastq), so I assume it is overwritten in each iteration. However, sometimes it does work and I get all records in one file!

Any idea why is this and how to avoid it?

Thank you!!

sequence • 153 views
ADD COMMENTlink modified 8 weeks ago • written 8 weeks ago by haasroni0
1

Looks okay to me. Is output_fq an output filename or output file handle? Are you sure the SeqIO.write(..) is called only once on a single file (output_fq) in your code (it should)?

ADD REPLYlink written 8 weeks ago by a.zielezinski9.6k

Yes, the output_fq is an output filename and I do call only once the SeqIO.write(..). Thank you for your help!

ADD REPLYlink written 7 weeks ago by haasroni0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1108 users visited in the last hour
_