extracting reads from fastq file based on read_id
3
0
Entering edit mode
7.8 years ago

I have a file with a set of read ids called tmp. I want to extract reads with these read ids from another fastq file called reads.fastq. When I use grep -f tmp reads.fastq I get only the first line (header) of the read. How can I get the output in full fastq format.

next-gen • 7.4k views
ADD COMMENT
4
Entering edit mode
7.8 years ago
GenoMax 141k

grep -A 3 seq_ID your file should get you the full fastq record. Iterate over the ID's you have.

filterbyname.sh from BBMap would also work with your list of ID's in a file.

ADD COMMENT
0
Entering edit mode

Today I learnt about -A. Thanks.

ADD REPLY
1
Entering edit mode
7.8 years ago
Charles Plessy ★ 2.9k

If your tmp and reads.fastq have their reads in the same order, you can try our syncpairs tools.

ADD COMMENT
0
Entering edit mode
7.8 years ago
brent_wilson ▴ 140

For bash, I would use a nice one-liner:

for x in $(cat tmp); do reads.fastq | grep $x -A 3; done

Of course you can redirect to wherever you want:

for x in $(cat tmp); do reads.fastq | grep $x -A 3; done > tmp2

I hope that this helps!

Brent Wilson, PhD | Project Scientist | Cofactor Genomics

4044 Clayton Ave. | St. Louis, MO 63110 | tel. 314.531.4647

Catch the latest from Cofactor on our blog.

ADD COMMENT
2
Entering edit mode

For m desired reads out of a fastq file with n total reads, this takes O(mn) time, as opposed to the -f flag given to grep (or just using BBMap), which is likely O(n).

ADD REPLY

Login before adding your answer.

Traffic: 1940 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6