Entering edit mode
                    11.0 years ago
        always_learning
        
    
        ★
    
    1.2k
    Hi all,
I am using PYSAM module for one of my scripts where I am working on pretty large VCF files but job is not completing everytime and showing memory issue. I tried to run this with large and faster machine though. Did any one face similar issue with pysam earlier too with large files ?
This is my python script:
import sys
import os
import pysam
freq_dir_file=sys.argv[1]
vcf_dir_file = sys.argv[2]
snp_pos=[]
os.environ['vcf_file'] = vcf_dir_file
os.system("zcat $vcf_file | head -5000 | parallel --pipe grep '^#'")
data = open(freq_dir_file)
for line in data:
        if not line.startswith("CHROM") and not line.strip().split("\t")[0] == "NA":
                col = line.strip().split("\t")[4:]
                for i in col:
                        val = i.strip().split(":")[1]
                        num = float(val)
                #Comment this if its for Low frequency variants
                #if num > 0.005 and num < 0.050:
                #Comment this if its for coding region
                        if num < 0.005 and not num == 0:
                                check = 1
                        else:
                                pass
        if check == 1:
                chmpos = line.strip().split("\t")[0] +" "+ line.strip().split("\t")[1]
                snp_pos.append(chmpos)
tabixfile = pysam.Tabixfile(vcf_dir_file)
for i in snp_pos:
        (chrom, snp) = i.split(" ")[0], i.split(" ")[1]
        val = int(snp)-1
        for vcf in tabixfile.fetch(str(chrom), val, int(snp)):
                print vcf
Are you sure this is a memory leak in pysam? Python itself isn't exactly the best with memory management, so if
freq_dir_fileis large then I could seesnp_posblowing up the available memory. Having said that, I've never looked at the underlying tabix C code, so perhaps there's an issue there.Since I am working on 32 GIGS RAM then chances of blowing up whole system memory with
snp_posis highly unlikely.Anyone?
Do you know at which line in the code the Memory leak occurs? And how many items do you expect in
snp_pos? Oh and can you give us the log of the error.Try filing an issue.