Question: What Does The "Proper Pair" Bitwise Flag Mean In A Sam File?
gravatar for Panos
9.8 years ago by
Geneva, Switzerland
Panos1.7k wrote:

I want to have an estimate of how many reads in a SAM file are mapped to the reference I'm using. I used

samtools view -S -F 0x0004 input.sam > output.sam

to filter out all reads that are not mapped but then I also saw this "proper pair" flag (0x0002) and got confused. Should I use this one (i.e. keep the reads containing it) instead of filtering out those reads containing 0x0004?

What exactly is a "proper pair"? Does it mean that the read itself as well as its mate are both mapped?

samtools sam • 23k views
ADD COMMENTlink modified 9.5 years ago by Mitch Bekritsky1.2k • written 9.8 years ago by Panos1.7k
gravatar for brentp
9.8 years ago by
Salt Lake City, UT
brentp23k wrote:

This depends on what you want to do. If you are using paired-end reads, the 0x2 flag means that both ends of the read were mapped and they were mapped within a reasonable distance given the expected distance (and probably standard deviation) that you gave the alignment software. (So, "Yes" to your final question).

Often people will do some analysis with only uniquely mapped, correctly paired reads--as those the ones where they are certain of the mapping. Later analysis can add reads where a single end of the pair mapped and the other did not--increased coverage at the expense of mapping certainty.

ADD COMMENTlink written 9.8 years ago by brentp23k
gravatar for Mitch Bekritsky
9.8 years ago by
Mitch Bekritsky1.2k
London, England
Mitch Bekritsky1.2k wrote:

That is not always the whole story..."proper pair" can also mean that the reads are correctly oriented with respect to one another, i.e. that one of the mate pairs maps to the forward strand and the other maps to the reverse strand. If the mates don't map in a proper pair, that may mean that both reads map to the forward or reverse strand.

I know this holds in MAQ and Stampy, but it doesn't seem to hold for BWA. In BWA, after a quick peek at some of my data, the reads that I have that aren't in a proper pair have mates that map to different chromosomes, similar to what brentp suggested.

If you want to check for whatever aligner you're using, you can parse the flags of some reads that have the 0x0002 flag. If they have both the 0x0010 and 0x0020 flags set to 1 or 0, that would be why they aren't in a "proper pair".

Bottom line: if something is mapping in an improper pair, I'd be suspicious about it having mapped correctly and probably exclude it.

There are some slides here from a MAQ presentation that describe proper pairs, at least for that aligner.

ADD COMMENTlink written 9.8 years ago by Mitch Bekritsky1.2k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1376 users visited in the last hour