Question: How to extract all the mapped reads from bwa output
0
gravatar for RT
2.8 years ago by
RT300
European Union
RT300 wrote:

Hi All,

I have mapped genomic data using bwa. What is the most efficient way to extract all the mapped reads (in fastq format)?

Thanks

bwa mapping fastq • 4.5k views
ADD COMMENTlink modified 7 months ago by tans03070 • written 2.8 years ago by RT300
4
gravatar for Ram
2.8 years ago by
Ram12k
New York
Ram12k wrote:

I see this as a two step process:

1. Filter BAM file and get only mapped reads (How To Filter Mapped Reads With Samtools )

2. Output said mapped reads in FASTQ. (http://broadinstitute.github.io/picard/command-line-overview.html#SamToFastq )

ADD COMMENTlink modified 2.8 years ago • written 2.8 years ago by Ram12k
3
gravatar for Brian Bushnell
2.8 years ago by
Walnut Creek, USA
Brian Bushnell14k wrote:

To do this as efficiently as possible, using BBTools:

reformat.sh in=reads.sam out=mapped.fq mappedonly

Also, BBMap has a lot of options designed for filtering, so it can output in fastq format and separate mapped from unmapped reads, preventing the creation of intermediate sam files.  This approach also keeps pairs together, which is not very easy using samtools for filtering.

bbmap.sh ref=reference.fa in=reads.fq outm=mapped.fq outu=unmapped.fq

ADD COMMENTlink modified 2.8 years ago • written 2.8 years ago by Brian Bushnell14k
1

BBTools is the new GATK :)

ADD REPLYlink written 2.8 years ago by Ram12k

Hi Brian, If I use reformat.sh in=reads.bam out=mapped.fq mappedonly , it works fine but treat mydata as 'Input is being processed as unpaired'. 

When I specified verifypaired=T reformat.sh in=reads.bam out=mapped.fq mappedonly verifypaired=T), I get the following errors:

Found samtools.
Input is being processed as paired
Writing interleaved.
Exception in thread "Thread-3" java.lang.AssertionError: 
    at stream.SamReadInputStream.toReadList(SamReadInputStream.java:134)
    at stream.SamReadInputStream.fillBuffer(SamReadInputStream.java:100)
    at stream.SamReadInputStream.hasMore(SamReadInputStream.java:56)
    at stream.ConcurrentGenericReadInputStream$ReadThread.readLists(ConcurrentGenericReadInputStream.java:642)
    at stream.ConcurrentGenericReadInputStream$ReadThread.run(ConcurrentGenericReadInputStream.java:634)

Also note that I am using directly the bam file. Is this ok?

ADD REPLYlink modified 2.8 years ago by Ram12k • written 2.8 years ago by RT300

As long as samtools is installed, you can directly use bam input.

"verifypaired" only works with fasta or fastq input, though, since the sam format offers no guarantees about read order (paired reads are not necessarily next to each other).  The reads will be output in the same order as they exist in the sam file.  The message about input being processed as unpaired indicates that reads are being filtered individually, rather than keeping pairs together; so if in a pair one read is mapped and the other is not, the mapped read will be kept and the unmapped read will be discarded.

ADD REPLYlink written 2.8 years ago by Brian Bushnell14k

Hi Brian, Thanks a lot for this. My goal is to keep the mapped pairs different files and also to keep the reads where one pair is mapped and the other one is not in one of the files. I am doing this because I have to remap these mapped reads (mapped on few genes ~ 2000 genes) onto the whole genome. 

Is it possible to do? or if you have any other suggestions then please suggest. 

 

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by RT300

Well... BBMap, by default, will send unmapped pairs to the "outu" stream and mapped pairs to the "outm" stream, keeping pairs together. It considers a pair mapped if either read is mapped (this behavior can be changed with the "pairedonly" flag). The output outm/outu streams can create fastq files instead of sam, which is often useful. Reformat cannot currently apply this filter to an existing sam/bam file.

ADD REPLYlink written 14 months ago by Brian Bushnell14k
0
gravatar for ATPoint
13 months ago by
ATPoint1.7k
ATPoint1.7k wrote:

You can first filter your raw bam file for mapped reads only: samtools view -F 4 in.bam -o out.bam. The -F options discards everything with the given flag, and flag 4 refers to all unmapped reads (see: http://broadinstitute.github.io/picard/explain-flags.html) and then use bedtools bamtofastq (http://bedtools.readthedocs.io/en/latest/content/tools/bamtofastq.html) to get the fastq from the bam.

ADD COMMENTlink modified 13 months ago • written 13 months ago by ATPoint1.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 940 users visited in the last hour