Question: how to write two fastq files into one
0
gravatar for susangao007
7 months ago by
susangao0070 wrote:

Hi everyone,

I was trying to read the left and right sequence reading from two fastq files and zip them into one. All I need it the ID name and sequence, not the + quality score. in the end, I should have ID name and sequence from left read, then ID name and sequence from right read, then ID name and sequence from left read..... I have trouble with the SeqRecord generation.

here is my code:

  1 #!/usr/bin/env python3
  2 # readFastq.py
  3 # Import Seq, SeqRecord, and SeqIO
  4 from Bio import SeqIO
  5 from Bio.SeqRecord import SeqRecord
  6 
  7 from Bio.Seq import Seq
  8 # Import itertools to take a slice of list
  9 import itertools
 10 # The first parameter to SeqIO.parse is the file location
 11 # The second parameter is the file type
 12 leftReads = SeqIO.parse("/scratch/AiptasiaMiSeq/fastq/Aip02.R1.fastq", "fast    q")
 13 rightReads = SeqIO.parse("/scratch/AiptasiaMiSeq/fastq/Aip02.R2.fastq","fast    q")
 14 seqIDleft=list()
 15 sequenceleft=list()
 16 seqIDright =list()
 17 sequenceright=list()
 18 # Just get five sequences as an esxample
 19 firstFiveleft = itertools.islice(leftReads, 5)
 20 firstFiveright = itertools.islice(rightReads, 5)
 21 
 22 
 23 for left in firstFiveleft:
 24   seqIDleft.appendleft.id)
 25   sequenceleft.append(str(left.seq))
 26 
 27 
 28 for right in firstFiveright:
 29   seqIDright.appendright.id)
 30   sequenceright.append(str(right.seq))
 31 
 32 A=zip(seqIDleft,seqIDright)
 33 B=zip(sequenceleft,sequenceright)
 34 
 35 C=SeqRecord(A,B)
 36 
 37 
 38 SeqIO.write(C, "Interleaved.fastq", "fastq")
python • 303 views
ADD COMMENTlink modified 7 months ago by Pierre Lindenbaum130k • written 7 months ago by susangao0070

An alternate solution that does not use python.

Use BBMap suite. Remove .gz extensions if you want to leave files uncompressed.

reformat.sh in1=file_R1.fq.gz in2=file2_fq.gz out=interleaved.fq.gz
ADD REPLYlink written 7 months ago by genomax90k
0
gravatar for Joe
7 months ago by
Joe18k
United Kingdom
Joe18k wrote:

Untested code, but the general approach should work:

from Bio import SeqIO
from itertools import chain

for i in chain(*zip(list(SeqIO.parse("/path/to/reads.R1.fastq", "fastq")), list(SeqIO.parse("/path/to/reads.R2.fastq", "fastq")))):
    SeqIO.write(i, "interleaved.fastq", "fastq")

If you want the entries without the quality scores, it might be enough to just switch to using SeqIO.write(..., ..., "fasta") but I haven't tested this.

Otherwise, it would be as simple as adding a line before the write call of the form x = SeqRecord(i.description, i.seq) , then write the x object).

ADD COMMENTlink written 7 months ago by Joe18k
0
gravatar for ATpoint
7 months ago by
ATpoint39k
Germany
ATpoint39k wrote:

Using a one-liner with seqtk to produce an interleaved fasta file:

seqtk mergepe test1.fq test2.fq | seqtk seq -A - > out.fa
ADD COMMENTlink modified 7 months ago • written 7 months ago by ATpoint39k
0
gravatar for Pierre Lindenbaum
7 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum130k wrote:
paste <(cat file1.fq | paste - - - - | cut -f1,2 ) <(cat file2.fq | paste - - - -| cut -f1,2 ) | tr "\t" "\n"
ADD COMMENTlink written 7 months ago by Pierre Lindenbaum130k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2224 users visited in the last hour