Question: How To Extract A Subset Of Reads In Fastq Using An Id List?
6
gravatar for Luke
7.5 years ago by
Luke240
Turin, Italy
Luke240 wrote:

Hello! I obtained a list of unmapped reads IDs from my BAM file and I want to remap only the unmapped reads with other parameters. How can I extract the subset of unmapped reads from my original fastq file? Thank you in advance, Luke

fastq bam • 11k views
ADD COMMENTlink modified 4.5 years ago by Brian Bushnell17k • written 7.5 years ago by Luke240

I have a post here which addresses part of this question

ADD REPLYlink written 2.0 years ago by steve2.4k
9
gravatar for Brian Bushnell
4.5 years ago by
Walnut Creek, USA
Brian Bushnell17k wrote:

I also wrote a program for this purpose, distributed with BBMap.  Usage:

filterbyname.sh in=reads.fq out=filtered.fq names=names.txt include=t

The "include" flag will toggle between including or excluding the names in names.txt (which can, alternately, be another fastq or fasta file).  This also supports paired input/output, and names being substrings or superstrings of read IDs.

ADD COMMENTlink written 4.5 years ago by Brian Bushnell17k
1

Thank you for this excellent tool which is rediculously fast when compared to scripts I've been using to achieve this goal.

ADD REPLYlink written 4.3 years ago by CraigM80
3
gravatar for Arun
7.5 years ago by
Arun2.3k
Germany
Arun2.3k wrote:

I prefer writing my own little snippets. However, its possible using biopieces. This reply is from seqanswers (by maasha), pasted here for convenience.

First you need a file with the FASTQ sequence names you are interested in - or IDs if you like - one per line. And then:

read_fastq -i in.fastq | grab -E ids.txt | write_fastq -xo out.fastq

check out grab for details.

ADD COMMENTlink written 7.5 years ago by Arun2.3k
2
gravatar for swbarnes2
7.5 years ago by
swbarnes27.0k
United States
swbarnes27.0k wrote:

It is simpler to go back to the original .bam, and just pull out the .bam entries that are unmapped. samtools view -f4 should do it. Then, you can use something like Picard's SamToFastq to go back to fastq format, if you need to. (Some software, like velvet, is fine with using .bam as input)

ADD COMMENTlink written 7.5 years ago by swbarnes27.0k
0
gravatar for Luke
7.5 years ago by
Luke240
Turin, Italy
Luke240 wrote:

I've found a quick solution with cdbfasta and cdbyank tools. First you have to index your fastq with cdbfasta, then you can search for the IDs in fastq with cdbyank. For more info http://sourceforge.net/projects/cdbfasta/ Thank you, Luke

ADD COMMENTlink written 7.5 years ago by Luke240
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 873 users visited in the last hour