Question: extracting DNA sequnces from multiple fastq.gz files
0
gravatar for abdul.karim
5 months ago by
abdul.karim0 wrote:

I have raw DNA sequences in multiple files as under.

 xxxxxx1.R1.fastq.gz
 xxxxxx2.R1.fastq.gz
 xxxxxx3.R1.fastq.gz
 xxxxxx4.R1.fastq.gz
 xxxxxx5.R1.fastq.gz

I can extract the DNA raw reads from a single file and can store it in another file by using the following command.

gunzip -c in.fastq.gz | awk '(NR%4==2)' > out.seq

Is there any way to extract the DNA reads from all the files and save all those DNA reads in a single text file. Instead of doing it one by one.

And I think is it good to do this in python or R instead of basic linux commands ? I guess that pythonic way will not be much efficient.

ADD COMMENTlink modified 5 months ago by h.mon29k • written 5 months ago by abdul.karim0
1

Umm, why? Why would you want to do this?

ADD REPLYlink written 5 months ago by swbarnes27.4k
1
gravatar for h.mon
5 months ago by
h.mon29k
Brazil
h.mon29k wrote:

Indeed, as swbarnes2 asked, why would you want to do this? In fact, I asked myself the same question when you asked Extatrcting only sequnces from fastq.gz . Most likely you don't need to do this, as nearly all modern bioinformatics tools will take (compressed) fastq files as input.

As for your question:

zcat *.fastq.gz | awk '(NR%4==2)' > all.fastq
ADD COMMENTlink written 5 months ago by h.mon29k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1045 users visited in the last hour