What Does The "Proper Pair" Bitwise Flag Mean In A Sam File?
2
20
Entering edit mode
11.7 years ago
Panos ★ 1.8k

I want to have an estimate of how many reads in a SAM file are mapped to the reference I'm using. I used

samtools view -S -F 0x0004 input.sam > output.sam


to filter out all reads that are not mapped but then I also saw this "proper pair" flag (0x0002) and got confused. Should I use this one (i.e. keep the reads containing it) instead of filtering out those reads containing 0x0004?

What exactly is a "proper pair"? Does it mean that the read itself as well as its mate are both mapped?

sam samtools • 28k views
15
Entering edit mode
11.7 years ago
brentp 24k

This depends on what you want to do. If you are using paired-end reads, the 0x2 flag means that both ends of the read were mapped and they were mapped within a reasonable distance given the expected distance (and probably standard deviation) that you gave the alignment software. (So, "Yes" to your final question).

Often people will do some analysis with only uniquely mapped, correctly paired reads--as those the ones where they are certain of the mapping. Later analysis can add reads where a single end of the pair mapped and the other did not--increased coverage at the expense of mapping certainty.

11
Entering edit mode
11.7 years ago
Mitch Bekritsky ★ 1.3k

That is not always the whole story..."proper pair" can also mean that the reads are correctly oriented with respect to one another, i.e. that one of the mate pairs maps to the forward strand and the other maps to the reverse strand. If the mates don't map in a proper pair, that may mean that both reads map to the forward or reverse strand.

I know this holds in MAQ and Stampy, but it doesn't seem to hold for BWA. In BWA, after a quick peek at some of my data, the reads that I have that aren't in a proper pair have mates that map to different chromosomes, similar to what brentp suggested.

If you want to check for whatever aligner you're using, you can parse the flags of some reads that have the 0x0002 flag. If they have both the 0x0010 and 0x0020 flags set to 1 or 0, that would be why they aren't in a "proper pair".

Bottom line: if something is mapping in an improper pair, I'd be suspicious about it having mapped correctly and probably exclude it.

There are some slides here from a MAQ presentation that describe proper pairs, at least for that aligner.