Question: extracting DNA sequnces from multiple fastq.gz files
0
gravatar for abdul.karim
12 months ago by
abdul.karim0 wrote:

I have raw DNA sequences in multiple files as under.

 xxxxxx1.R1.fastq.gz
 xxxxxx2.R1.fastq.gz
 xxxxxx3.R1.fastq.gz
 xxxxxx4.R1.fastq.gz
 xxxxxx5.R1.fastq.gz

I can extract the DNA raw reads from a single file and can store it in another file by using the following command.

gunzip -c in.fastq.gz | awk '(NR%4==2)' > out.seq

Is there any way to extract the DNA reads from all the files and save all those DNA reads in a single text file. Instead of doing it one by one.

And I think is it good to do this in python or R instead of basic linux commands ? I guess that pythonic way will not be much efficient.

ADD COMMENTlink modified 12 months ago by h.mon31k • written 12 months ago by abdul.karim0
1

Umm, why? Why would you want to do this?

ADD REPLYlink written 12 months ago by swbarnes28.6k
1
gravatar for h.mon
12 months ago by
h.mon31k
Brazil
h.mon31k wrote:

Indeed, as swbarnes2 asked, why would you want to do this? In fact, I asked myself the same question when you asked Extatrcting only sequnces from fastq.gz . Most likely you don't need to do this, as nearly all modern bioinformatics tools will take (compressed) fastq files as input.

As for your question:

zcat *.fastq.gz | awk '(NR%4==2)' > all.fastq
ADD COMMENTlink written 12 months ago by h.mon31k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2018 users visited in the last hour