How to extract all the mapped reads from bwa output
3
2
Entering edit mode
9.2 years ago
GR ▴ 400

Hi All,

I have mapped genomic data using bwa. What is the most efficient way to extract all the mapped reads (in fastq format)?

Thanks

bwa fastq mapping • 12k views
ADD COMMENT
6
Entering edit mode
9.2 years ago
Ram 43k

I see this as a two step process:

  1. Filter BAM file and get only mapped reads (How To Filter Mapped Reads With Samtools)
  2. Output said mapped reads in FASTQ. (http://broadinstitute.github.io/picard/command-line-overview.html#SamToFastq)
ADD COMMENT
5
Entering edit mode
9.2 years ago

To do this as efficiently as possible, using BBTools:

reformat.sh in=reads.sam out=mapped.fq mappedonly

Also, BBMap has a lot of options designed for filtering, so it can output in fastq format and separate mapped from unmapped reads, preventing the creation of intermediate sam files. This approach also keeps pairs together, which is not very easy using samtools for filtering.

bbmap.sh ref=reference.fa in=reads.fq outm=mapped.fq outu=unmapped.fq
ADD COMMENT
1
Entering edit mode

BBTools is the new GATK :)

ADD REPLY
0
Entering edit mode

Hi Brian, If I use reformat.sh in=reads.bam out=mapped.fq mappedonly, it works fine but treat mydata as 'Input is being processed as unpaired'.

When I specified verifypaired=T (reformat.sh in=reads.bam out=mapped.fq mappedonly verifypaired=T), I get the following errors:

Found samtools.
Input is being processed as paired
Writing interleaved.
Exception in thread "Thread-3" java.lang.AssertionError: 
    at stream.SamReadInputStream.toReadList(SamReadInputStream.java:134)
    at stream.SamReadInputStream.fillBuffer(SamReadInputStream.java:100)
    at stream.SamReadInputStream.hasMore(SamReadInputStream.java:56)
    at stream.ConcurrentGenericReadInputStream$ReadThread.readLists(ConcurrentGenericReadInputStream.java:642)
    at stream.ConcurrentGenericReadInputStream$ReadThread.run(ConcurrentGenericReadInputStream.java:634)

Also note that I am using directly the bam file. Is this ok?

ADD REPLY
0
Entering edit mode

As long as samtools is installed, you can directly use bam input.

"verifypaired" only works with fasta or fastq input, though, since the sam format offers no guarantees about read order (paired reads are not necessarily next to each other). The reads will be output in the same order as they exist in the sam file. The message about input being processed as unpaired indicates that reads are being filtered individually, rather than keeping pairs together; so if in a pair one read is mapped and the other is not, the mapped read will be kept and the unmapped read will be discarded.

ADD REPLY
1
Entering edit mode

Hi Brian, Thanks a lot for this. My goal is to keep the mapped pairs different files and also to keep the reads where one pair is mapped and the other one is not in one of the files. I am doing this because I have to remap these mapped reads (mapped on few genes ~ 2000 genes) onto the whole genome.

Is it possible to do? or if you have any other suggestions then please suggest.

ADD REPLY
0
Entering edit mode

Well... BBMap, by default, will send unmapped pairs to the "outu" stream and mapped pairs to the "outm" stream, keeping pairs together. It considers a pair mapped if either read is mapped (this behavior can be changed with the "pairedonly" flag). The output outm/outu streams can create fastq files instead of sam, which is often useful. Reformat cannot currently apply this filter to an existing sam/bam file.

ADD REPLY
0
Entering edit mode
7.6 years ago
ATpoint 81k

You can first filter your raw bam file for mapped reads only: samtools view -F 4 in.bam -o out.bam.

The -F option discards everything with the given flag, and flag 4 refers to all unmapped reads (see: http://broadinstitute.github.io/picard/explain-flags.html) and then use bedtools bamtofastq to get the fastq from the bam.

ADD COMMENT

Login before adding your answer.

Traffic: 1559 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6