Question

Sequence extraction from a fastq file

1

Entering edit mode

9.4 years ago

bpz ▴ 60

Hello everyone

I have some Ilumina reads from a metagenomic project, in a fastq file and I am trying to "fish out" some sequences in particular from all the mess. I conducted a blast search of this and I got the sequences I am interested in, in fasta format. The thing is, I need the sequences in fastq format for assembly. How can I extract the sequences from the original fastq file using the blast fasta file as a reference? or should I just convert my output blast file in fasta format to fastq format?

Thanks in advance.

sequencing Assembly sequence blast alignment • 5.9k views

ADD COMMENT • link updated 2.2 years ago by Ram 43k • written 9.4 years ago by bpz ▴ 60

0

Entering edit mode

Great, I will try it.

Thanks

ADD REPLY • link updated 2.2 years ago by Ram 43k • written 9.4 years ago by bpz ▴ 60

0

Entering edit mode

Are you sure you want the whole sequence from the Fastq file? If you have quality trimmed your reads and/or are really just interested in the parts matching the reference sequences then the grep approach may not be what you want.

ADD REPLY • link 9.4 years ago by SES 8.6k

0

Entering edit mode

Yes, you are right. My solution will only work if he used "Read ID" or "Header info" which I assume has been preserved between fasta and fastq files.

ADD REPLY • link updated 2.2 years ago by Ram 43k • written 9.4 years ago by Ashutosh Pandey 12k

0

Entering edit mode

Right, the IDs would need to be the same, though that is not what I was referring to. I meant that if you quality trim a file and want to extract those reads from another file (or just keep the blast query string), then pulling reads from a file with the IDs alone won't work (in that case, the trimming and match information would be lost). Hopefully that is clear.

ADD REPLY • link 9.4 years ago by SES 8.6k

Ram · Answer 1 · 2014-11-13

So you are trying to say that that you have header lines for those sequences and the sequence itself (fasta format) but you want to add quality scores from the original fastq file. You can simply use grep -A3 "Header_info" Original.fastq and it should give you 3 lines plus header or fastq sequence for that header.

EDIT: Just found this post: Quickest way to extract subset of reads from huge fastq file