Question: iterate over reads in .SAM file with pysam - IndexError: list index out of range
0
gravatar for Florian BERNARD
3.0 years ago by
France/Bordeaux
Florian BERNARD20 wrote:

Hi everyone,

I have a bunch of cDNA reads generated by minion sequencing and mapped onto the genome with minimap2. Now I'm trying to find the genomic positions of the exon junctions in order to class my reads by exon usage. I'm using pysam to get the coordinates of every base from the CIGAR string and then I compare one position with the previous one and, if the distance is greater than 90 (for example), I consider it as an intron and so I print the coordinates.

Here is the small script I have written so far:

import pysam

samfile = pysam.AlignmentFile("file.sam", "r")

for read in samfile:
    pos = pysam.AlignedSegment.get_reference_positions(read, full_length=False)
    for i in range(len(pos)):
        if (pos[i+1]-pos[i]) > 90:
            print(read.query_name+" "+ str(pos[i]) + " : " + str(pos[i+1]))

samfile.close()

However, it seems to work for the first read in the SAM file but not for the next ones and all I get is :

04f7c7b3-2959-4e73-b025-c659677e26b0 14585254 : 14622303
04f7c7b3-2959-4e73-b025-c659677e26b0 14622510 : 14622827
04f7c7b3-2959-4e73-b025-c659677e26b0 14622896 : 14623009
04f7c7b3-2959-4e73-b025-c659677e26b0 14623147 : 14623827
04f7c7b3-2959-4e73-b025-c659677e26b0 14623897 : 14624302
04f7c7b3-2959-4e73-b025-c659677e26b0 14624419 : 14624955
04f7c7b3-2959-4e73-b025-c659677e26b0 14625088 : 14625278
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-19-bf9e6dc8c1a4> in <module>()
      6     pos = pysam.AlignedSegment.get_reference_positions(read, full_length=False)
      7     for i in range(len(pos)):
----> 8         if (pos[i+1]-pos[i]) > 90:
      9             print(read.query_name+" "+ str(pos[i]) + " : " + str(pos[i+1]))
     10 

IndexError: list index out of range

To be fair, I don't understand why it raises this error and how I can fix it in order to get all coordinates for every junctions of each reads.

If anyone has an idea I would be happy to hear about it ! Thanks in advance to anyone reading me. Florian.

sam pysam python • 3.0k views
ADD COMMENTlink modified 3.0 years ago by Pierre Lindenbaum134k • written 3.0 years ago by Florian BERNARD20
1
gravatar for Pierre Lindenbaum
3.0 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum134k wrote:
pos[i+1]

you cannot get the 0-based (i+1)-th position in the list if there are only 'i' items in that list.

try

for i in range(len(pos)-1):
ADD COMMENTlink modified 3.0 years ago by GenoMax96k • written 3.0 years ago by Pierre Lindenbaum134k

that's perfect ! Thanks a lot.

ADD REPLYlink written 3.0 years ago by Florian BERNARD20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1555 users visited in the last hour
_