Question

extracting DNA sequnces from multiple fastq.gz files

0

Entering edit mode

4.6 years ago

abdul.karim • 0

I have raw DNA sequences in multiple files as under.

 xxxxxx1.R1.fastq.gz
 xxxxxx2.R1.fastq.gz
 xxxxxx3.R1.fastq.gz
 xxxxxx4.R1.fastq.gz
 xxxxxx5.R1.fastq.gz

I can extract the DNA raw reads from a single file and can store it in another file by using the following command.

gunzip -c in.fastq.gz | awk '(NR%4==2)' > out.seq

Is there any way to extract the DNA reads from all the files and save all those DNA reads in a single text file. Instead of doing it one by one.

And I think is it good to do this in python or R instead of basic linux commands ? I guess that pythonic way will not be much efficient.

sequencing sequence gene raw_read_DNA multi_file • 2.0k views

ADD COMMENT • link updated 4.6 years ago by h.mon 35k • written 4.6 years ago by abdul.karim • 0

1

Entering edit mode

Umm, why? Why would you want to do this?

ADD REPLY • link 4.6 years ago by swbarnes2 14k

score 1 · Answer 1 · 2019-09-10

Indeed, as swbarnes2 asked, why would you want to do this? In fact, I asked myself the same question when you asked Extatrcting only sequnces from fastq.gz . Most likely you don't need to do this, as nearly all modern bioinformatics tools will take (compressed) fastq files as input.

As for your question:

zcat *.fastq.gz | awk '(NR%4==2)' > all.fastq