how to write two fastq files into one
3
0
Entering edit mode
21 months ago

Hi everyone,

I was trying to read the left and right sequence reading from two fastq files and zip them into one. All I need it the ID name and sequence, not the + quality score. in the end, I should have ID name and sequence from left read, then ID name and sequence from right read, then ID name and sequence from left read..... I have trouble with the SeqRecord generation.

here is my code:

  1 #!/usr/bin/env python3
  2 # readFastq.py
  3 # Import Seq, SeqRecord, and SeqIO
  4 from Bio import SeqIO
  5 from Bio.SeqRecord import SeqRecord
  6 
  7 from Bio.Seq import Seq
  8 # Import itertools to take a slice of list
  9 import itertools
 10 # The first parameter to SeqIO.parse is the file location
 11 # The second parameter is the file type
 12 leftReads = SeqIO.parse("/scratch/AiptasiaMiSeq/fastq/Aip02.R1.fastq", "fast    q")
 13 rightReads = SeqIO.parse("/scratch/AiptasiaMiSeq/fastq/Aip02.R2.fastq","fast    q")
 14 seqIDleft=list()
 15 sequenceleft=list()
 16 seqIDright =list()
 17 sequenceright=list()
 18 # Just get five sequences as an esxample
 19 firstFiveleft = itertools.islice(leftReads, 5)
 20 firstFiveright = itertools.islice(rightReads, 5)
 21 
 22 
 23 for left in firstFiveleft:
 24   seqIDleft.appendleft.id)
 25   sequenceleft.append(str(left.seq))
 26 
 27 
 28 for right in firstFiveright:
 29   seqIDright.appendright.id)
 30   sequenceright.append(str(right.seq))
 31 
 32 A=zip(seqIDleft,seqIDright)
 33 B=zip(sequenceleft,sequenceright)
 34 
 35 C=SeqRecord(A,B)
 36 
 37 
 38 SeqIO.write(C, "Interleaved.fastq", "fastq")
python • 989 views
ADD COMMENT
0
Entering edit mode

An alternate solution that does not use python.

Use BBMap suite. Remove .gz extensions if you want to leave files uncompressed.

reformat.sh in1=file_R1.fq.gz in2=file2_fq.gz out=interleaved.fq.gz
ADD REPLY
0
Entering edit mode
21 months ago
Joe 19k

Untested code, but the general approach should work:

from Bio import SeqIO
from itertools import chain

for i in chain(*zip(list(SeqIO.parse("/path/to/reads.R1.fastq", "fastq")), list(SeqIO.parse("/path/to/reads.R2.fastq", "fastq")))):
    SeqIO.write(i, "interleaved.fastq", "fastq")

If you want the entries without the quality scores, it might be enough to just switch to using SeqIO.write(..., ..., "fasta") but I haven't tested this.

Otherwise, it would be as simple as adding a line before the write call of the form x = SeqRecord(i.description, i.seq) , then write the x object).

ADD COMMENT
0
Entering edit mode
21 months ago
ATpoint 55k

Using a one-liner with seqtk to produce an interleaved fasta file:

seqtk mergepe test1.fq test2.fq | seqtk seq -A - > out.fa
ADD COMMENT
0
Entering edit mode
21 months ago
paste <(cat file1.fq | paste - - - - | cut -f1,2 ) <(cat file2.fq | paste - - - -| cut -f1,2 ) | tr "\t" "\n"
ADD COMMENT

Login before adding your answer.

Traffic: 1993 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6