Question: Pysam get query sequence range without loop
0
gravatar for yryan
11 months ago by
yryan0
yryan0 wrote:

Hi folks,

I'm trying to optimise my code here. Esentially I'm trying to use pysam to get a list of bases of reads around a base of interest as shown in the code below. However my code is quite inefficient as the only way I've managed to get it working is by looping through each column in the pileup instead of just going directly to my location of interest. Could anyone help me optimise this?

def location_extractor(input_file, location, save_name):
samfile = pysam.AlignmentFile(input_file, "rb", threads=8)
read_names = []
base_0 = []
full_sequence = []
for pileupcolumn in samfile.pileup("ref", location, (location+1)):
    # if lower error rate between control and exposed v different, higher = loss
    pileupcolumn.set_min_base_quality(4)
    for pileupread in pileupcolumn.pileups:  # very inefficient, getting to location first would speed up vs looping
        if pileupcolumn.pos != location:
            continue
        else:
            if not pileupread.is_del and not pileupread.is_refskip:  # skips deletions and skips?
                read_names.append(str(pileupread.alignment.query_name))
                base_0.append(str(pileupread.alignment.query_sequence[pileupread.query_position-10:pileupread.query_position+11])) # where actual reads are extracted

location_extractor("exposed_5h_sorted.sam", 4604, "exposed_5.fasta")

Cheers!

pysam python • 420 views
ADD COMMENTlink written 11 months ago by yryan0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2340 users visited in the last hour
_