Question: extracting DNA sequnces from multiple fastq.gz files
gravatar for abdul.karim
5 months ago by
abdul.karim0 wrote:

I have raw DNA sequences in multiple files as under.


I can extract the DNA raw reads from a single file and can store it in another file by using the following command.

gunzip -c in.fastq.gz | awk '(NR%4==2)' > out.seq

Is there any way to extract the DNA reads from all the files and save all those DNA reads in a single text file. Instead of doing it one by one.

And I think is it good to do this in python or R instead of basic linux commands ? I guess that pythonic way will not be much efficient.

ADD COMMENTlink modified 5 months ago by h.mon29k • written 5 months ago by abdul.karim0

Umm, why? Why would you want to do this?

ADD REPLYlink written 5 months ago by swbarnes27.4k
gravatar for h.mon
5 months ago by
h.mon29k wrote:

Indeed, as swbarnes2 asked, why would you want to do this? In fact, I asked myself the same question when you asked Extatrcting only sequnces from fastq.gz . Most likely you don't need to do this, as nearly all modern bioinformatics tools will take (compressed) fastq files as input.

As for your question:

zcat *.fastq.gz | awk '(NR%4==2)' > all.fastq
ADD COMMENTlink written 5 months ago by h.mon29k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1045 users visited in the last hour