Python script for filtering out read quality
0
0
Entering edit mode
4.5 years ago
JohnJACK • 0

I have been assigned some coursework to write a python script that can filter out the read pairs where either one of the reads have a quality score below 30 from illumina sequencing reads in fastq format.

My task is to write a script that will output two files in the fastq format where one has reads that have a score above 30 and the other has read pairs where either one has a score below 30.

How should I go about approaching this problem? Thanks. (sorry for the bad english)

fastq python • 1.7k views
ADD COMMENT
1
Entering edit mode

Have you looked at the biopython cookbook?

ADD REPLY
0
Entering edit mode

It's not completely clear where your problem is. Is it that you don't know how to do it in Python? I'd say (before one of the others does it), give us some of your ideas and someone might guide you from there. So, if it is just that the Python part is the problem, then outline the steps that you want/need to do. If it is not that, then please clarify what you're struggling with.

ADD REPLY
0
Entering edit mode

How should I go about approaching this problem?

Isn't the point of assigning the course work to make you think how your should approach this problem?

At a very high level you could the following (one way and perhaps not the most efficient way):

  1. Opening R1/R2 files
  2. Reading 1 fastq record from each (I will assume you are not allowed to use biopython since this may be for basic python programming).
  3. Walk through quality score string of each read (after splitting it), convert Q score code to a numeric value, get a cumulative total and see if avg Q score is < 30.
  4. If it is discard the read pair and move on to next. If not write pair to output.
  5. Tidy up once you exhaust data.
ADD REPLY

Login before adding your answer.

Traffic: 1628 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6