How to output FASTQ file with four lines per sequence?
0
0
Entering edit mode
2.2 years ago
Apex92 ▴ 280

I am working with a fastq file where I take specific parts from each sequence (I remove adapters). I read the fastq file using biopython SeqIO and I do not know how to print the third and fourth lines (phred scores) as they are in the original file. Any inputs?

Here is my code:

with open("test.fastq", "r") as Fastq:
    for record in SeqIO.parse(Fastq,'fastq'):
        if record.id in lst:
            adapter_pos = record.seq.find('AACTGTAGGCACCATCAAT')
            RNAseq = record.seq[:adapter_pos]
            adapter_seq=record.seq[adapter_pos:adapter_pos+19]
            umi_seq = record.seq[adapter_pos+19:adapter_pos+19+12]
            print(record.id)
            print(RNAseq+adapter_seq)
biopython python fastq • 1.0k views
ADD COMMENT
0
Entering edit mode

you also need to trim or extract qualities in line with sequence. Search for iterator properties (in this record) to extract quality and third line from a fastq record.

ADD REPLY
0
Entering edit mode

thank you for your response - exactly the problem that I have is searching for iterator properties in the records to extract quality and third line from a fastq record.

Do you have any suggestions?

ADD REPLY
1
Entering edit mode

letter_annotations (4th line) for SeqIO.parse. But these are (4th line) decoded scores. Refer to "property letter_annotations" section in "https://biopython.org/docs/1.75/api/Bio.SeqRecord.html" page.

If you want scores as they are in fastq records, you can use SeqIO.QualityIO.FastqGeneralIterator (title = read ID, sequence = sequence, quality= quality) and you can print + for third line.

ADD REPLY

Login before adding your answer.

Traffic: 2281 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6