Question: Sequence extraction from a fastq file
1
gravatar for bpz
4.4 years ago by
bpz60
Mexico
bpz60 wrote:

Hello everyone

I have some Ilumina reads from a metagenomic project, in a fastq file and I am trying to "fish out" some sequences in particular from all the mess. I conducted a blast search of this and I got the sequences I am interested in, in fasta format. The thing is, I need the sequences in fastq format for assembly. How can I extract the sequences from the original fastq file using the blast fasta file as a reference? or should I just convert my output blast file in fasta format to fastq format?

Thanks in advance.

 

 

ADD COMMENTlink modified 4.4 years ago • written 4.4 years ago by bpz60

Great, I will try it.

 

Thanks

ADD REPLYlink written 4.4 years ago by bpz60

Are you sure you want the whole sequence from the Fastq file? If you have quality trimmed your reads and/or are really just interested in the parts matching the reference sequences then the grep approach may not be what you want.

ADD REPLYlink written 4.4 years ago by SES8.2k

Yes, you are right. My solution will only work if he used "Read ID" or "Header info" which I assume has been preserved between fasta and fastq files. 

ADD REPLYlink written 4.4 years ago by Ashutosh Pandey11k

Right, the IDs would need to be the same, though that is not what I was referring to. I meant that if you quality trim a file and want to extract those reads from another file (or just keep the blast query string), then pulling reads from a file with the IDs alone won't work (in that case, the trimming and match information would be lost). Hopefully that is clear.

ADD REPLYlink written 4.4 years ago by SES8.2k
2
gravatar for Ashutosh Pandey
4.4 years ago by
Philadelphia
Ashutosh Pandey11k wrote:

So you are trying to say that that you have header lines for those sequences and the sequence itself (fasta format) but you want to add quality scores from the original fastq file. You can simply use grep -A3 "Header_info" Original.fastq and it should give you 3 lines plus header or fastq sequence for that header.  

EDIT: Just found this post: Quickest way to extract subset of reads from huge fastq file

ADD COMMENTlink modified 4.4 years ago • written 4.4 years ago by Ashutosh Pandey11k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 874 users visited in the last hour