Question: Finding bases at specific sites using pysam pileup
0
gravatar for kris_A
4.0 years ago by
kris_A40
New Zealand
kris_A40 wrote:

Hi, 

I am trying to write a python script that finds bases at specific sites in each read (aligning to the site). I am using pysam pileup, but cannot figure out how to use it properly. Could someone help me out with this ?

the input data is a list of coordinates and a samfile = pysam.AlignmentFile("bam", "rb")

I just want to iterate through each read in a bam aligning to a single base site, and extract  bases at each of the sites.  

I tried doing this, but it's outputting each base multiple times. 

for pileupcolumn in samfile.pileup('1', 14693, 14694):
    for pileupread in pileupcolumn.pileups:
        print pileupread.alignment.query_sequence[pileupread.query_position]

 

thank you!

Kristina

sequence • 6.1k views
ADD COMMENTlink modified 4.0 years ago • written 4.0 years ago by kris_A40
4
gravatar for kris_A
4.0 years ago by
kris_A40
New Zealand
kris_A40 wrote:

I found a solution. Ended up reading-in bam file using pysam.Samfile, and then just extracted sequence fragment at pileupcolumn.pos. 

import pysam
samfile = pysam.Samfile(bam, "rb")
for pileupcolumn in samfile.pileup( '1', 14693, 14694):
    for pileupread in pileupcolumn.pileups:
        if pileupcolumn.pos == 14693:
            base = pileupread.alignment.query_sequence[pileupread.query_position]
            print base
samfile.close()
ADD COMMENTlink modified 4.0 years ago • written 4.0 years ago by kris_A40
1
gravatar for geek_y
4.0 years ago by
geek_y9.4k
Barcelona/CRG/London/Imperial
geek_y9.4k wrote:

I am assuming that you would like to see the base present at 14694 position from all the reads.

 

Something like this would do:

bamfile=pysam.Samfile("input_bam.bam")

for read in bamfile.pileup('1', 14693, 14694):
    print read.seq[14694-read.pos]

  Not tested. You need to take care of  forward, reverse reads POS and 0/1 based coordinates and Mainly insertions/deletions.

ADD COMMENTlink modified 4.0 years ago • written 4.0 years ago by geek_y9.4k
0
gravatar for kris_A
4.0 years ago by
kris_A40
New Zealand
kris_A40 wrote:

Hi, thanks, I tried, and I'm getting an error (the same applies when I  use "query_sequence" instead of "seq"

 AttributeError: 'pysam.calignmentfile.PileupColumn' object has no attribute 'seq'

As much as I understand, I have to iterate over the reads (pileups) in the pileupcolumn, but then I get the extracted bases repeated many times .. any other ideas ?thank you .

 

 

ADD COMMENTlink written 4.0 years ago by kris_A40

can you post entire script  ?

ADD REPLYlink written 4.0 years ago by geek_y9.4k

my initial attempt is posted in my first message, and now I just directly plugged in what you posted earlier. when I got an "attribute is not present" I changed it to 

for read in bamfile.pileup('1', 14693, 14694):
    print read.query_sequence[14694-read.query_position]

but that was a useless attempt

ADD REPLYlink written 4.0 years ago by kris_A40

The reason I asked for entire script is to see how you are reading the bam file. 

pysam.AlignmentFile() or pysam.Samfile() ? If you are using pysam.AlignmentFile(), then try something like this:

for read in bamfile.fetch('1', 14693, 14694):
    print read.seq[14694-read.pos]
ADD REPLYlink written 4.0 years ago by geek_y9.4k

So i am using pysam.AlignmentFile(), and tried what you suggest, but now I get: 

IndexError: string index out of range

ADD REPLYlink written 4.0 years ago by kris_A40

Hi kris_A, Did you solve the problem? As far as I know, if I want the retrieve the base at a specific position of the read by something like the following code. The read.pos is supposed to be the order of the specific read instead of the position of reference.

for read in bamfile.fetch('1', 14693, 14694):
    print read.seq[14694-read.pos]
ADD REPLYlink written 18 months ago by chengju3900
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1361 users visited in the last hour