Question: how to write two fastq files into one
0
gravatar for susangao007
6 weeks ago by
susangao0070 wrote:

Hi everyone,

I was trying to read the left and right sequence reading from two fastq files and zip them into one. All I need it the ID name and sequence, not the + quality score. in the end, I should have ID name and sequence from left read, then ID name and sequence from right read, then ID name and sequence from left read..... I have trouble with the SeqRecord generation.

here is my code:

  1 #!/usr/bin/env python3
  2 # readFastq.py
  3 # Import Seq, SeqRecord, and SeqIO
  4 from Bio import SeqIO
  5 from Bio.SeqRecord import SeqRecord
  6 
  7 from Bio.Seq import Seq
  8 # Import itertools to take a slice of list
  9 import itertools
 10 # The first parameter to SeqIO.parse is the file location
 11 # The second parameter is the file type
 12 leftReads = SeqIO.parse("/scratch/AiptasiaMiSeq/fastq/Aip02.R1.fastq", "fast    q")
 13 rightReads = SeqIO.parse("/scratch/AiptasiaMiSeq/fastq/Aip02.R2.fastq","fast    q")
 14 seqIDleft=list()
 15 sequenceleft=list()
 16 seqIDright =list()
 17 sequenceright=list()
 18 # Just get five sequences as an esxample
 19 firstFiveleft = itertools.islice(leftReads, 5)
 20 firstFiveright = itertools.islice(rightReads, 5)
 21 
 22 
 23 for left in firstFiveleft:
 24   seqIDleft.appendleft.id)
 25   sequenceleft.append(str(left.seq))
 26 
 27 
 28 for right in firstFiveright:
 29   seqIDright.appendright.id)
 30   sequenceright.append(str(right.seq))
 31 
 32 A=zip(seqIDleft,seqIDright)
 33 B=zip(sequenceleft,sequenceright)
 34 
 35 C=SeqRecord(A,B)
 36 
 37 
 38 SeqIO.write(C, "Interleaved.fastq", "fastq")
python • 127 views
ADD COMMENTlink modified 6 weeks ago by Pierre Lindenbaum127k • written 6 weeks ago by susangao0070

An alternate solution that does not use python.

Use BBMap suite. Remove .gz extensions if you want to leave files uncompressed.

reformat.sh in1=file_R1.fq.gz in2=file2_fq.gz out=interleaved.fq.gz
ADD REPLYlink written 6 weeks ago by genomax80k
0
gravatar for Joe
6 weeks ago by
Joe16k
United Kingdom
Joe16k wrote:

Untested code, but the general approach should work:

from Bio import SeqIO
from itertools import chain

for i in chain(*zip(list(SeqIO.parse("/path/to/reads.R1.fastq", "fastq")), list(SeqIO.parse("/path/to/reads.R2.fastq", "fastq")))):
    SeqIO.write(i, "interleaved.fastq", "fastq")

If you want the entries without the quality scores, it might be enough to just switch to using SeqIO.write(..., ..., "fasta") but I haven't tested this.

Otherwise, it would be as simple as adding a line before the write call of the form x = SeqRecord(i.description, i.seq) , then write the x object).

ADD COMMENTlink written 6 weeks ago by Joe16k
0
gravatar for ATpoint
6 weeks ago by
ATpoint31k
Germany
ATpoint31k wrote:

Using a one-liner with seqtk to produce an interleaved fasta file:

seqtk mergepe test1.fq test2.fq | seqtk seq -A - > out.fa
ADD COMMENTlink modified 6 weeks ago • written 6 weeks ago by ATpoint31k
0
gravatar for Pierre Lindenbaum
6 weeks ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum127k wrote:
paste <(cat file1.fq | paste - - - - | cut -f1,2 ) <(cat file2.fq | paste - - - -| cut -f1,2 ) | tr "\t" "\n"
ADD COMMENTlink written 6 weeks ago by Pierre Lindenbaum127k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2260 users visited in the last hour