Question: extracting DNA sequnces from multiple fastq.gz files
0
gravatar for abdul.karim
11 days ago by
abdul.karim0 wrote:

I have raw DNA sequences in multiple files as under.

 xxxxxx1.R1.fastq.gz
 xxxxxx2.R1.fastq.gz
 xxxxxx3.R1.fastq.gz
 xxxxxx4.R1.fastq.gz
 xxxxxx5.R1.fastq.gz

I can extract the DNA raw reads from a single file and can store it in another file by using the following command.

gunzip -c in.fastq.gz | awk '(NR%4==2)' > out.seq

Is there any way to extract the DNA reads from all the files and save all those DNA reads in a single text file. Instead of doing it one by one.

And I think is it good to do this in python or R instead of basic linux commands ? I guess that pythonic way will not be much efficient.

ADD COMMENTlink modified 11 days ago by h.mon27k • written 11 days ago by abdul.karim0
1

Umm, why? Why would you want to do this?

ADD REPLYlink written 11 days ago by swbarnes26.5k
1
gravatar for h.mon
11 days ago by
h.mon27k
Brazil
h.mon27k wrote:

Indeed, as swbarnes2 asked, why would you want to do this? In fact, I asked myself the same question when you asked Extatrcting only sequnces from fastq.gz . Most likely you don't need to do this, as nearly all modern bioinformatics tools will take (compressed) fastq files as input.

As for your question:

zcat *.fastq.gz | awk '(NR%4==2)' > all.fastq
ADD COMMENTlink written 11 days ago by h.mon27k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 945 users visited in the last hour