Question: memory leak using pysam fetch
0
gravatar for always_learning
4.7 years ago by
Doha, Qatar
always_learning970 wrote:

Hi all, 

I am using PYSAM module for one of my scripts where I am working on pretty large VCF files but job is not completing everytime and showing memory issue. I tried to run this with large and faster machine though.  Did any one face similar issue with pysam earlier too with large files ?

 

This is my python script:

import sys
import os
import pysam
freq_dir_file=sys.argv[1]
vcf_dir_file = sys.argv[2]
snp_pos=[]
os.environ['vcf_file'] = vcf_dir_file
os.system("zcat $vcf_file | head -5000 | parallel --pipe grep '^#'")
data = open(freq_dir_file)
for line in data:
        if not line.startswith("CHROM") and not line.strip().split("\t")[0] == "NA":
                col = line.strip().split("\t")[4:]
                for i in col:
                        val = i.strip().split(":")[1]
                        num = float(val)
                #Comment this if its for Low frequency variants
                #if num > 0.005 and num < 0.050:
                #Comment this if its for coding region
                        if num < 0.005 and not num == 0:
                                check = 1
                        else:
                                pass
        if check == 1:
                chmpos = line.strip().split("\t")[0] +" "+ line.strip().split("\t")[1]
                snp_pos.append(chmpos)

tabixfile = pysam.Tabixfile(vcf_dir_file)
for i in snp_pos:
        (chrom, snp) = i.split(" ")[0], i.split(" ")[1]
        val = int(snp)-1
        for vcf in tabixfile.fetch(str(chrom), val, int(snp)):
                print vcf
pysam python vcf • 1.4k views
ADD COMMENTlink modified 4.7 years ago • written 4.7 years ago by always_learning970

Are you sure this is a memory leak in pysam? Python itself isn't exactly the best with memory management, so if freq_dir_file is large then I could see snp_pos blowing up the available memory. Having said that, I've never looked at the underlying tabix C code, so perhaps there's an issue there.

ADD REPLYlink written 4.7 years ago by Devon Ryan90k

Since I am working on 32 GIGS RAM then chances of blowing up whole system memory with snp_pos is highly unlikely. 

ADD REPLYlink written 4.7 years ago by always_learning970

Any one ???

ADD REPLYlink written 4.7 years ago by always_learning970

Do you know at which line in the code the Memory leak ocurs? And how many items do you expect in snp_pos? Oh and can you give us the log of the error.

ADD REPLYlink modified 4.7 years ago • written 4.7 years ago by Coryza380

Try filing an issue.

ADD REPLYlink written 4.7 years ago by Devon Ryan90k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1697 users visited in the last hour