extracting DNA sequnces from multiple fastq.gz files
1
0
Entering edit mode
4.6 years ago

I have raw DNA sequences in multiple files as under.

 xxxxxx1.R1.fastq.gz
 xxxxxx2.R1.fastq.gz
 xxxxxx3.R1.fastq.gz
 xxxxxx4.R1.fastq.gz
 xxxxxx5.R1.fastq.gz

I can extract the DNA raw reads from a single file and can store it in another file by using the following command.

gunzip -c in.fastq.gz | awk '(NR%4==2)' > out.seq

Is there any way to extract the DNA reads from all the files and save all those DNA reads in a single text file. Instead of doing it one by one.

And I think is it good to do this in python or R instead of basic linux commands ? I guess that pythonic way will not be much efficient.

sequencing sequence gene raw_read_DNA multi_file • 2.0k views
ADD COMMENT
1
Entering edit mode

Umm, why? Why would you want to do this?

ADD REPLY
1
Entering edit mode
4.6 years ago
h.mon 35k

Indeed, as swbarnes2 asked, why would you want to do this? In fact, I asked myself the same question when you asked Extatrcting only sequnces from fastq.gz . Most likely you don't need to do this, as nearly all modern bioinformatics tools will take (compressed) fastq files as input.

As for your question:

zcat *.fastq.gz | awk '(NR%4==2)' > all.fastq
ADD COMMENT

Login before adding your answer.

Traffic: 1765 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6