Question: How To Extract A Subset Of Reads In Fastq Using An Id List?
5
gravatar for Luke
6.9 years ago by
Luke220
Turin, Italy
Luke220 wrote:

Hello! I obtained a list of unmapped reads IDs from my BAM file and I want to remap only the unmapped reads with other parameters. How can I extract the subset of unmapped reads from my original fastq file? Thank you in advance, Luke

fastq bam • 9.9k views
ADD COMMENTlink modified 3.9 years ago by Brian Bushnell16k • written 6.9 years ago by Luke220

I have a post here which addresses part of this question

ADD REPLYlink written 16 months ago by steve1.9k
8
gravatar for Brian Bushnell
3.9 years ago by
Walnut Creek, USA
Brian Bushnell16k wrote:

I also wrote a program for this purpose, distributed with BBMap.  Usage:

filterbyname.sh in=reads.fq out=filtered.fq names=names.txt include=t

The "include" flag will toggle between including or excluding the names in names.txt (which can, alternately, be another fastq or fasta file).  This also supports paired input/output, and names being substrings or superstrings of read IDs.

ADD COMMENTlink written 3.9 years ago by Brian Bushnell16k
1

Thank you for this excellent tool which is rediculously fast when compared to scripts I've been using to achieve this goal.

ADD REPLYlink written 3.7 years ago by CraigM80
2
gravatar for Arun
6.9 years ago by
Arun2.3k
Germany
Arun2.3k wrote:

I prefer writing my own little snippets. However, its possible using biopieces. This reply is from seqanswers (by maasha), pasted here for convenience.

First you need a file with the FASTQ sequence names you are interested in - or IDs if you like - one per line. And then:

read_fastq -i in.fastq | grab -E ids.txt | write_fastq -xo out.fastq

check out grab for details.

ADD COMMENTlink written 6.9 years ago by Arun2.3k
2
gravatar for swbarnes2
6.9 years ago by
swbarnes25.2k
United States
swbarnes25.2k wrote:

It is simpler to go back to the original .bam, and just pull out the .bam entries that are unmapped. samtools view -f4 should do it. Then, you can use something like Picard's SamToFastq to go back to fastq format, if you need to. (Some software, like velvet, is fine with using .bam as input)

ADD COMMENTlink written 6.9 years ago by swbarnes25.2k
0
gravatar for Luke
6.9 years ago by
Luke220
Turin, Italy
Luke220 wrote:

I've found a quick solution with cdbfasta and cdbyank tools. First you have to index your fastq with cdbfasta, then you can search for the IDs in fastq with cdbyank. For more info http://sourceforge.net/projects/cdbfasta/ Thank you, Luke

ADD COMMENTlink written 6.9 years ago by Luke220
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1520 users visited in the last hour