Hi guys I have 32GB of memory ram in my pc and I am trying to count all the aac from a compressed fasta file of 67GB. Of course I tried to use all the tools I know, biopython, generators, etc and I have still inefficient results, my pc almost stops and I don't have access to any type of cloud environment. I am working independently in this task. I didn't try any linux stuff because I am trying to use python. I was trying to make a long string from the file and then count the aacs.
def element_count(filename): """Returns a dicitonary Counter with the counting if the aac that compose the sequences from a gzip fasta file. The aacs as keys and counts as values.""" with gzip.open(filename, "rt") as handle: aac_count = Counter() seq = ''.join(str(rec.seq) for rec in SeqIO.parse(handle, "fasta")) aac_count.update(seq) yield aac_count
There are any way to improve this or any available free tool I can use to facilitate my task? Any way I am very thankful for your support and help. I always learn a lot here and in the stackoverflow.