Question: Getting genomic locations from cigar string
0
gravatar for KVC_bioinfo
4 days ago by
KVC_bioinfo230
WA, USA
KVC_bioinfo230 wrote:

edited post

Hello all,

I want to get all genomic locations (start and end) where the alignment occurred. For this, I am trying to write a python script. I am planning to use cigar string from sam file to find a number of matches and starting position of the alignment.

I have multiple lists of the tuple. (i, j)

[(0, 117), (3, 29773), (0, 253), (2, 1325), (0, 145)]

[(0, 116), (2, 1), (3, 3419), (0, 327), (3, 21529), (0, 286), (2, 1)]

[(0, 117), (3, 25275), (0, 180), (1, 1), (0, 1), (3, 5895), (0, 145)]

And I have another list which consists of some numbers.

[66905968, 66906104, 66905996]

In desired output:

I want to add the values (j) from the tuple if i = 0 or 2 for each number on my list. With one condition: every time value of i is 3 it should stop adding and use that number as next starting point.

For example for:

[(0, 117), (3, 29773), (0, 253), (2, 1325), (0, 145)]

and

66905968

I want:

66905968 , 66905968+117

66905968+117+29773, 66905968+117+29773+253+1325+145

I have the following code so far:

import pysam
import sys


pos = []

new = []
reffile = pysam.Fastafile("ref.fasta")
pure_bam = pysam.AlignmentFile('sample.bam', "rb")

for read in pure_bam:
    for read in pure_bam:
    pos.append(read.pos)
    for pp_sam in pos:
        for i , j in read.cigar:
            while i == 0 or i == 2:
                new.append(j + pp_sam)

This is definitely not giving the desired output. Could someone help me? Thank you very much. Thank you very much in advance.

python cigar • 158 views
ADD COMMENTlink modified 4 days ago • written 4 days ago by KVC_bioinfo230
2

Unless this is a learning exercise others have done this already:
Going From Cigar String In Sam To Genomic Coordinates?
Python Cigar String - Finding Indels Break Points Positions
and possibly others.

ADD REPLYlink modified 4 days ago • written 4 days ago by genomax39k

Could anyone get what am I doing wrong here?

ADD REPLYlink written 3 days ago by KVC_bioinfo230
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1336 users visited in the last hour