Quickest way to extract subset of reads from huge fastq file
3
2
Entering edit mode
10.0 years ago
Prakki Rama ★ 2.7k

Hi all,

Could I please know if there is quickest way to extract reads from a huge fastq file to another. I already tried the following.

grep -A3 '1:N:0:' ORGAN1.fastq >ORGAN1.cleaned.fastq

but grep takes too long. Any oneliners from you are very much appreciated.

Thank you

Prakki Rama.

unix RNA-Seq fastq next-gen • 8.1k views
ADD COMMENT
6
Entering edit mode
10.0 years ago

A faster way is to do this:

LC_ALL=C fgrep -A3 '1:N:0:' ORGAN1.fastq >ORGAN1.cleaned.fastq

Here is an explanation of why this is so much faster.

ADD COMMENT
1
Entering edit mode

Wow!! Normal grep on a sample file took 17 sec, whereas LC_ALL=C just took only 4 sec. Wonderful! Thank you very much.

ADD REPLY
0
Entering edit mode

For finding the fixed string using LC_ALL=C fgrep is very fast. But when it comes to finding regex, it is slower (although slightly faster than normal grep).

ADD REPLY
2
Entering edit mode

fgrep doesn't work with regexes (that's why it's faster), could it be that it switches to egrep or grep -g for you?

ADD REPLY
0
Entering edit mode

Yes. You are true. It does not work for regex. I was only looking only at the time of execution. My mistake.

ADD REPLY
0
Entering edit mode
10.0 years ago

If you are repeatedly querying this file, try splitting it into smaller units (say, with UNIX split), and then search through the smaller files in parallel. You could do this with, say, jobs scheduled on an SGE grid, or with GNU Parallel.

ADD COMMENT
0
Entering edit mode

Thank you Alex. But, I might not need repeatedly query the file. Grep is taking long time. sed's situation is also more or less seems same.

ADD REPLY
0
Entering edit mode
10.0 years ago
da44da • 0

Thanks for the information shared. It was looking on the internet.

ADD COMMENT
0
Entering edit mode

Try to avoid adding an answer if you're not answering the question. You can always use the "Add Comment" button below the question or below another answer if you want to.

ADD REPLY

Login before adding your answer.

Traffic: 2962 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6