Get single cigar character for every position from reference start position to end position for a read ?
2
0
Entering edit mode
7.5 years ago
winter_li ▴ 60

HI, ref_start_position is 100 and ref_end_position is 175, so is there a function to return every cigar character for each ref position from 100 to 175 . eg . the cigar character of position 110 is 'D', the cigar character of position 172 is 'I' .not just cigar character for ref position matching to referece . the return result can be a tuple ,like [(single cigar character,ref position start ) ...... (single cigar character,ref position end ) ]

Best Wishes!

SNP sequencing alignment • 2.0k views
ADD COMMENT
0
Entering edit mode
7.5 years ago

Given your phrasing I assume you're using pysam, so just write a function that takes the cigartuples tuple and start position expands it accordingly.

ADD COMMENT
0
Entering edit mode
7.5 years ago
d-cameron ★ 2.9k

Unfortunately, such a mapping losing information as there can be read sequence between aligned bases. Consider the following alignment against a poly A reference sequence:

1S1M2I1D2I1M
-A--A--A <- reference sequence
SMIIDIIM

and the one-to-one mapping function you are requesting would return [(M,100), (D, 101), (M, 102)]. This loses information about the soft clipped base as well as the two insertions since these read bases have no corresponding reference base thus no corresponding position.

If, for some reason, you want to consider unmapped bases to align to a particular base, you will need a convention as to whether they are left assigned or right assigned.

ADD COMMENT

Login before adding your answer.

Traffic: 2517 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6