Question: Extract One-End Mapped Paired-End Reads From Bwa Bam/Sam Files.
2
gravatar for Bioscientist
7.9 years ago by
Bioscientist1.7k
Bioscientist1.7k wrote:

Now need to use Pindel(split-read approach to identify CNV). I'm just wondering how I can extract one-end mapped reads from BWA result? Does samtools have such options, or we need to extract by parsing the sam/bam file according to flags?

Thanks

Edit: Just another question. After BWA running, I actually do the filtering to pick out those mapping with MAPQ > 20; will this affect picking up unmapped/one-end mapped reads?

I guess this won't affect finding one-end mapped reads; but those two-end unmapped reads cannot be picked up.

Edit again: This is my another question: I ran the command:

samtools view -u -f 8 -F 260 map.bam  > oneEndMapped.bam

And let's look at the line of oneEndMapped.bam:

HWI-ST150_0129:2:1:1414:2357#0    153    chr1    154432976    37    101M    =    154432976    0    ACATACATATGTATGTAATATGTGATTATTACTGGTGGTTTGCTAAATAATGACAAATGGAAAAAAATATATGTCATCCAAAGAGAGCCACTGTTAATTTT    BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBdc^ddcacc\TdcdddddYbe_dffddcffcffecedecedaeeeedd\def^ffeff`c    RG:Z:101127_s_2    XT:A:U    NM:i:2    SM:i:37    AM:i:0    X0:i:1    X1:i:0    XM:i:2    XO:i:0    XG:i:0    MD:Z:16G20A63

How to interpret this: To me, this looks like both ends have the same sequences, and map to the same position. (leftmost coordinate of one read; and leftmost of another). Why do we say this represents one read maps while the other one doesn't map? To describe "one read maps while the other one doesn't map", in my mind, it should be like:

HWI-ST150_0129:2:1:1414:2357#0    153    chr1    154432976    37    101M    =    *   *

Also running

samtools view -u  -f 4 -F 264 map.bam  > oneEndUnmapped.bam

This oneEndUnmapped.bam is only 8K, means nothing. I don't quite understand this.

thx

samtools pindel • 12k views
ADD COMMENTlink modified 5.9 years ago by Biostar ♦♦ 20 • written 7.9 years ago by Bioscientist1.7k

If all you did was select SAM mapping entries with score of > 20 and left the entire line intact, there shouldn't be any problem using the samtool view commands I posted.

ADD REPLYlink written 7.9 years ago by Damian Kao15k

Actually this inspires me to think about what MAPQ stands for. How it's calculated. I mean if one-end maps correctly with a probability p; while each of the both-mapping ends maps correctly with the same probability. Then will the MAPQ be the same? Or the both-mapping MAPQ twice the one-end?

ADD REPLYlink written 7.9 years ago by Bioscientist1.7k

Also I look at the samtools/view man page, and seems -f is required flag; while -F is filtering flag.And there's a page for explaining SAM flags:http://picard.sourceforge.net/explain-flags.html.....Can anyone kindly explain a little about required vs filtering flag? thx

ADD REPLYlink modified 19 days ago by RamRS24k • written 7.9 years ago by Bioscientist1.7k

Hi I edited my response to explain the -f and -F.

ADD REPLYlink modified 19 days ago by RamRS24k • written 7.9 years ago by Damian Kao15k
6
gravatar for Damian Kao
7.9 years ago by
Damian Kao15k
USA
Damian Kao15k wrote:

Samtools allows you to output entries with certain flag with the -f and -F options. SamTools manual: http://samtools.sourceforge.net/samtools.shtml

So for all pair-end reads where one end is mapped and the other end isn't mapped, this should output all the mapped end entries:

samtools view -u -f 8 -F 260 map.bam  > oneEndMapped.bam

and this will output all the unmapped end entries:

samtools view -u  -f 4 -F 264 map.bam  > oneEndUnmapped.bam

edit*

Use this tool to figure out what the flags mean: http://picard.sourceforge.net/explain-flags.html

In the first command to get one end mapped, the -f option will make Samtools take any entries that has 8, meaning take any entries that has mate unmapped. The -F option will make Samtools skip any alignments that has 260, meaning skip any where the read itself is unmapped and where it is not a primary alignment.

In the second command, -f of 4 will take any read that is unmapped. -F of 264 will skip any entry where the mate is unmapped (making both unmapped) and where it is not a primary alignment.

ADD COMMENTlink modified 7.9 years ago • written 7.9 years ago by Damian Kao15k
1

samtools view -u -f 12 -F 256 map.bam > bothEndsUnmapped.bam

ADD REPLYlink written 7.9 years ago by Rm7.9k

Why do we need -F 264 when we have specified -f 4? -f 4 will only filter-in unmapped reads so there won't be any primary alignments left, right?

ADD REPLYlink written 5.8 years ago by komal.rathi3.4k

Hi Damian Kao, oneEndUnmapped.bam appears to be truncated after running above commands. Is there a way to extract reads from this truncated BAM file?

ADD REPLYlink written 5.2 years ago by Prakki Rama2.3k

Hello, I wanted to ask if sb can confirm me:

(sorry if this not the appropriate place ,but it's the  discussion exactly to what I need to know! )

In case we want to keep the paired reads that are  both mapped, is it enough to run:

               view -b -F 12 reads.bam > pairedMapped.bam  ??

Thanks in advance

ADD REPLYlink modified 4.6 years ago • written 4.6 years ago by mariakondili1310
2
gravatar for Rlong
7.9 years ago by
Rlong340
US
Rlong340 wrote:

Pindel now supports reading bams directly. Pindel looks for pairs where one mate has mapped and the other remains unmapped. Since pindel is reading these in directly from the bam, you should not need to worry about dumping them into some sort of intermediary file.

ADD COMMENTlink written 7.9 years ago by Rlong340

oh, i see...thx

ADD REPLYlink written 7.9 years ago by Bioscientist1.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1959 users visited in the last hour