Identifying mate pair reads from SAM file without using QNAME
1
0
Entering edit mode
8.2 years ago
StarCute ▴ 110

I am writing a C++ program that needs to process mate pair reads. I would like to avoid using the QNAME field in combination with the FLAG and PNEXT fields. I'd prefer not using the QNAME because the format seems to change depending on how the data was processed. Is it sufficient to simply check the FLAG (first in pair, second in pair) and PNEXT fields? Basically, I want a primary key that doesn't involve QNAME. I realize similar questions have been asked, but they did not satisfy me.

alignment • 1.9k views
ADD COMMENT
0
Entering edit mode

I'd prefer not using the QNAME because the format seems to change depending on how the data was processed

Please, give us an example of such case.

QNAME+FLAG is used by most (all?) tools (picard, samtools... )

ADD REPLY
0
Entering edit mode

I'm worried about one SAM file having the 0/1, 0/2 format and another having an identical QNAME format.

QNAME_0/1
QNAME_0/2

OR

QNAME
QNAME
ADD REPLY
0
Entering edit mode
8.2 years ago

You'll need to combine the flags with PNEXT/RNEXT and additionally try to use the read names to break ties (you don't need to look for exact matches, just the longest exact substring match). On the plus side, only a few random version of most common aligners don't strip thing like "/1" off the read names.

ADD COMMENT

Login before adding your answer.

Traffic: 2038 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6