Question

How To Represent Multi-Mapping Paired-End Reads In Sam Format?

2

Entering edit mode

12.6 years ago

Bio_X2Y ★ 4.4k

Imagine we have a read with two ends - A and B. An aligner finds that A maps uniquely to some location, and B can be mapped to two locations that satisfy the expected insert size.

Conceptually, should we:

Treat the ends as basically independent, and store 3 alignment records in SAM (A>P1, B>P2, B>P3), where the PNEXT field of both B alignments point to the A location, and the PNEXT field of the A alignment arbitrarily points to one of the two B locations? OR
Treat the results as 2 consistent pairs, and write 4 alignment records (A>P1, B>P2 and A>P1, B>P3). In this scenario, the PNEXT of the first A alignment points to the first B location, and the PNEXT of the second A alignment points to the second B location.

The very existence of the PNEXT field implies to me that the authors intended to maintain pairing information, so that the (b) interpretation might be correct. However, if this is the case, it seems that some ambiguity might be unavoidable (e.g. imagine a scenario where A has two alignments, both starting at the same position [e.g. one spliced], and B also has two alignments starting at the same position; in this case, it seems the original pairings produced by an aligner cannot be represented unambiguously in SAM).

Thanks.

sam rna paired multiple • 3.7k views

ADD COMMENT • link 12.6 years ago by Bio_X2Y ★ 4.4k

0

Entering edit mode

Close as duplicated on the samtools-mailing list? http://sourceforge.net/mailarchive/message.php?msg_id=28077133

ADD REPLY • link 12.6 years ago by Peter 6.0k

0

Entering edit mode

I posted here to BioStar first, and cross-referenced this post on the mailing list. I would be tempted to leave it open here, and I'll update it with any relevant replies on the mailing list.

ADD REPLY • link 12.6 years ago by Bio_X2Y ★ 4.4k