Question: understanding pysam.AlignmentFile output
0
gravatar for bitpir
10 months ago by
bitpir190
bitpir190 wrote:

I am new to pysam and I'm trying to figure out the output of each column of pysam.AlignmentFile. My code is as follow:

for b in pysam.AlignmentFile(bam,'rb'):
    print (b)

The output I see is this:

HWI-ST1285:282:C8MPEACXX:6:2211:11377:16289 163 0   10281   12  31S29M40S   0   10369   29  CCCTCACCCTCACCCTCACCCTCACCCTCACCCCTACCCCCAACCCCAACCCCAACCCCACACCCCACCCCAAACCCACCACCCACACCTCCCCCCACCC    array('B', [34, 34, 31, 37, 37, 37, 35, 37, 30, 35, 39, 39, 35, 40, 38, 31, 37, 34, 38, 38, 40, 36, 38, 35, 37, 39, 40, 40, 8, 34, 38, 33, 35, 31, 31, 30, 8, 23, 33, 37, 39, 33, 33, 37, 14, 20, 31, 26, 31, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]) [('NM', 1), ('MD', '5A23'), ('AS', 24), ('XS', 35), ('RG', '6_AD016')]
HWI-ST1285:282:C8MPEACXX:6:2211:11377:16289 83  0   10369   25  58M1I12M1D29M   0   10281   100 CACCCTCACCCTAACCCTAACCCCTAACCCCTAACCCTAACCCTAACCCTAACCCTAACCCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTA    array('B', [2, 2, 33, 35, 32, 27, 27, 18, 33, 34, 30, 32, 23, 7, 33, 35, 33, 33, 31, 28, 33, 32, 32, 27, 24, 18, 6, 25, 33, 34, 32, 26, 29, 21, 32, 33, 30, 30, 26, 13, 36, 39, 34, 35, 28, 22, 32, 40, 38, 34, 40, 37, 40, 38, 37, 28, 30, 21, 41, 39, 38, 35, 39, 37, 33, 38, 39, 39, 35, 37, 36, 34, 38, 39, 36, 34, 34, 39, 37, 38, 36, 36, 36, 40, 40, 38, 38, 32, 35, 39, 37, 37, 26, 33, 32, 37, 35, 27, 30, 31])    [('NM', 5), ('MD', '0A5A63^C28C0'), ('AS', 78), ('XS', 78), ('RG', '6_AD016')]

However, when I use samtools view to view bam, I got this:

 HWI-ST1285:282:C8MPEACXX:6:2211:11377:16289    163 chr1    10282   12  31S29M40S   =   10370   188 CCCTCACCCTCACCCTCACCCTCACCCTCACCCCTACCCCCAACCCCAACCCCAACCCCACACCCCACCCCAAACCCACCACCCACACCTCCCCCCACCC    CC@FFFDF?DHHDIG@FCGGIEGDFHII)CGBD@@?)8BFHBBF/5@;@###################################################    NM:i:1  MD:Z:5A23   AS:i:24 XS:i:35 RG:Z:6_AD016

HWI-ST1285:282:C8MPEACXX:6:2211:11377:16289 83  chr1    10370   25  58M1I12M1D29M   =   10282   -188    CACCCTCACCCTAACCCTAACCCCTAACCCCTAACCCTAACCCTAACCCTAACCCTAACCCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTA    ##BDA<<3BC?A8(BDBB@=BAA<93':BCA;>6AB??;.EHCD=7AIGCIFIGF=?6JHGDHFBGHHDFECGHECCHFGEEEIIGGADHFF;BAFD<?@    NM:i:5  MD:Z:0A5A63^C28C0   AS:i:78 XS:i:78 RG:Z:6_AD016

I notice that the 9th column is different (29 and 100 from pysam and 188 and -188 from samtools view). Why is this so? Also, from the samtools view output, I can see that the difference between 10370 - 10282 is 88, but why is 100 being added to the 9th column (template length) to yield 188?

Thanks for your help!

sam pysam bam • 385 views
ADD COMMENTlink modified 10 months ago by onestop_data250 • written 10 months ago by bitpir190
0
gravatar for onestop_data
10 months ago by
onestop_data250
onestop_data250 wrote:

Hi I debugged it on my side, and I concluded that the 9th column on pysam will be the alignment length (try b.alen in your for loop), but if you read the SAM manual it says that the 9th column is the template length (which you can get from on the pysam object with b.template_length)

The bottom line, you are good to go it is just the order or which column is shown. Install some IDE such as Pycharm and it should be easy for you to debug things like this.

Let me know if it is clear.

ADD COMMENTlink written 10 months ago by onestop_data250
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1873 users visited in the last hour