How To Represent Multi-Mapping Paired-End Reads In Sam Format?
0
2
Entering edit mode
12.6 years ago
Bio_X2Y ★ 4.4k

Imagine we have a read with two ends - A and B. An aligner finds that A maps uniquely to some location, and B can be mapped to two locations that satisfy the expected insert size.

Conceptually, should we:

  1. Treat the ends as basically independent, and store 3 alignment records in SAM (A>P1, B>P2, B>P3), where the PNEXT field of both B alignments point to the A location, and the PNEXT field of the A alignment arbitrarily points to one of the two B locations? OR

  2. Treat the results as 2 consistent pairs, and write 4 alignment records (A>P1, B>P2 and A>P1, B>P3). In this scenario, the PNEXT of the first A alignment points to the first B location, and the PNEXT of the second A alignment points to the second B location.

The very existence of the PNEXT field implies to me that the authors intended to maintain pairing information, so that the (b) interpretation might be correct. However, if this is the case, it seems that some ambiguity might be unavoidable (e.g. imagine a scenario where A has two alignments, both starting at the same position [e.g. one spliced], and B also has two alignments starting at the same position; in this case, it seems the original pairings produced by an aligner cannot be represented unambiguously in SAM).

Thanks.

sam rna paired multiple • 3.7k views
ADD COMMENT
0
Entering edit mode

Close as duplicated on the samtools-mailing list? http://sourceforge.net/mailarchive/message.php?msg_id=28077133

ADD REPLY
0
Entering edit mode

I posted here to BioStar first, and cross-referenced this post on the mailing list. I would be tempted to leave it open here, and I'll update it with any relevant replies on the mailing list.

ADD REPLY

Login before adding your answer.

Traffic: 2453 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6