Question

Faster extraction of reads mapped to a region

0

Entering edit mode

7.3 years ago

novice ★ 1.1k

So I think I've just read every single post about this on the internet.

I simply want to extract reads mapped to a certain region. This is the pipeline I'm currently using:

samtools view sorted.bam "chr3:1000-5000" | cut -f1 > ids
LC_ALL=C grep -wFf ids < original.sam > subset.sam  # this is the bottleneck
echo -e "$(samtools view -H original.sam)\n$(cat subset.sam)" > subset.sam
samtools view -b subset.sam | samtools fastq -1 reads1.fq -2 reads2.fq -

This is not too bad for a single run, but I actually have to do this for 1000+ regions. Is there possibly a more efficient way to extract the reads, or is this as fast as it gets?

Edit: added updating-the-header step
Edit: Help

samtools alignment • 2.0k views

ADD COMMENT • link 7.3 years ago by novice ★ 1.1k

0

Entering edit mode

Could you clarify what the desired output is? Do you want fastq files, read IDs, a subset bam or something else?

ADD REPLY • link 7.3 years ago by WouterDeCoster 47k

0

Entering edit mode

The fastq files (paired end).

ADD REPLY • link 7.3 years ago by novice ★ 1.1k

score 0 · Answer 1 · 2016-12-23

0

Entering edit mode

7.3 years ago

venu 7.1k

I would do something like this

cat <(samtools view -H merged.mdup.bam) <(samtools view merged.mdup.bam "chr3:20000-200000") | samtools view -bS - > subset.bam

You can loop the above command on every region giving region as name to the output file instead of subset.bam.

BTW, are you trying to generate fastq from your subset.bam ? Because I did not find fastq argument in my version (samtools-1.2), instead I found bam2fq.

ADD COMMENT • link 7.3 years ago by venu 7.1k

0

Entering edit mode

samtools fastq is in samtools v.1.3.1

ADD REPLY • link 7.3 years ago by GenoMax 141k

0

Entering edit mode

fastq is the new bam2fq.

The problem with this solution is that, if I pipe it into fastq, I get discordant fastq files. I need to make sure to extract the pairs of reads as well.

ADD REPLY • link 7.3 years ago by novice ★ 1.1k