Python 3: Only Look at Quality Line
1
0
Entering edit mode
19 months ago

I have an assignment that I am stumped over:

Write a Python 3 program that reads a fastq file and calculate how many bases have Phred base read quality of zero, between 1 and 10 (inclusive), 11 and 20, 21 and 30, 31 and 40, and above 40. Read the FASTQ file on the server and calculate the output.

The quality line is in ACSII format, so I have to convert or consider that in my code.Also, I am not allowed to use the Bio.SeqIO thing.

I promise I've tried so many different ways, but I am an absolute noob to Python and programming in general. It just doesn't make sense to me. Going to be honest here. I'm taking this class because I am required for my degree. Not because I'm super interested or have a knack for. But I am trying...

My understanding is that I have to pull out the fourth line of the fastq file and count the number of bases with the set range.

I started out with:

with open("myfile") as file:
quality = lines [4::4]


But of course that is wrong.I've done the .startswith command, but that only works if the line starts with a specific item, which would be tedious with a quality line since it is not consistent. I would appreciate any help.

And before the admin takes it down, yes I know there is this post: How To Pull Out Qual From Multi Line Fastq?
But the code does not work for me.

Python 3 FASTQ Python3 Python • 728 views
0
Entering edit mode
19 months ago

you should use a generator like this:

with open("myfile") as file:
id = next(file)
seq = next(file)
tmp = next(file)
qual = next(file)


The variables id, seq, tmp and qual will contain the relevant information and you can do with each what you need.

you can now iterate over each quality like so:

for q in qual:
print (q)


and you obtain the Phred encoding with:

c = ord(q) - 33


if from this information if you cannot build your code,and you are not having an interest in the process, then, in my opinion, you should consider dropping the course. it is going to get more complicated and difficult for you in the future.