iterate over reads in .SAM file with pysam - IndexError: list index out of range
1
0
Entering edit mode
6.1 years ago

Hi everyone,

I have a bunch of cDNA reads generated by minion sequencing and mapped onto the genome with minimap2. Now I'm trying to find the genomic positions of the exon junctions in order to class my reads by exon usage. I'm using pysam to get the coordinates of every base from the CIGAR string and then I compare one position with the previous one and, if the distance is greater than 90 (for example), I consider it as an intron and so I print the coordinates.

Here is the small script I have written so far:

import pysam

samfile = pysam.AlignmentFile("file.sam", "r")

for read in samfile:
    pos = pysam.AlignedSegment.get_reference_positions(read, full_length=False)
    for i in range(len(pos)):
        if (pos[i+1]-pos[i]) > 90:
            print(read.query_name+" "+ str(pos[i]) + " : " + str(pos[i+1]))

samfile.close()

However, it seems to work for the first read in the SAM file but not for the next ones and all I get is :

04f7c7b3-2959-4e73-b025-c659677e26b0 14585254 : 14622303
04f7c7b3-2959-4e73-b025-c659677e26b0 14622510 : 14622827
04f7c7b3-2959-4e73-b025-c659677e26b0 14622896 : 14623009
04f7c7b3-2959-4e73-b025-c659677e26b0 14623147 : 14623827
04f7c7b3-2959-4e73-b025-c659677e26b0 14623897 : 14624302
04f7c7b3-2959-4e73-b025-c659677e26b0 14624419 : 14624955
04f7c7b3-2959-4e73-b025-c659677e26b0 14625088 : 14625278
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-19-bf9e6dc8c1a4> in <module>()
      6     pos = pysam.AlignedSegment.get_reference_positions(read, full_length=False)
      7     for i in range(len(pos)):
----> 8         if (pos[i+1]-pos[i]) > 90:
      9             print(read.query_name+" "+ str(pos[i]) + " : " + str(pos[i+1]))
     10 

IndexError: list index out of range

To be fair, I don't understand why it raises this error and how I can fix it in order to get all coordinates for every junctions of each reads.

If anyone has an idea I would be happy to hear about it ! Thanks in advance to anyone reading me. Florian.

pysam SAM python • 4.5k views
ADD COMMENT
1
Entering edit mode
6.1 years ago
pos[i+1]

you cannot get the 0-based (i+1)-th position in the list if there are only 'i' items in that list.

try

for i in range(len(pos)-1):
ADD COMMENT
0
Entering edit mode

that's perfect ! Thanks a lot.

ADD REPLY

Login before adding your answer.

Traffic: 2743 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6