Question: How To Represent Multi-Mapping Paired-End Reads In Sam Format?
gravatar for Bio_X2Y
7.5 years ago by
Bio_X2Y3.6k wrote:

Imagine we have a read with two ends - A and B. An aligner finds that A maps uniquely to some location, and B can be mapped to two locations that satisfy the expected insert size.

Conceptually, should we:

  1. Treat the ends as basically independent, and store 3 alignment records in SAM (A>P1, B>P2, B>P3), where the PNEXT field of both B alignments point to the A location, and the PNEXT field of the A alignment arbitrarily points to one of the two B locations? OR

  2. Treat the results as 2 consistent pairs, and write 4 alignment records (A>P1, B>P2 and A>P1, B>P3). In this scenario, the PNEXT of the first A alignment points to the first B location, and the PNEXT of the second A alignment points to the second B location.

The very existence of the PNEXT field implies to me that the authors intended to maintain pairing information, so that the (b) interpretation might be correct. However, if this is the case, it seems that some ambiguity might be unavoidable (e.g. imagine a scenario where A has two alignments, both starting at the same position [e.g. one spliced], and B also has two alignments starting at the same position; in this case, it seems the original pairings produced by an aligner cannot be represented unambiguously in SAM).


paired rna sam multiple • 2.5k views
ADD COMMENTlink written 7.5 years ago by Bio_X2Y3.6k

Close as duplicated on the samtools-mailing list?

ADD REPLYlink written 7.5 years ago by Peter5.8k

I posted here to BioStar first, and cross-referenced this post on the mailing list. I would be tempted to leave it open here, and I'll update it with any relevant replies on the mailing list.

ADD REPLYlink written 7.5 years ago by Bio_X2Y3.6k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1626 users visited in the last hour