Question: Quicker way to know if a SNV is involved in a mismatch alignment
MarVi wrote:

Dear all,

I have an advice to ask. I have a collection of alignments, of which I want to know if the mismatches found in the reads when comparing with the reference sequence (genome) are due to an SNV. I have all the SNV's noted down and stored per chromosome in python dictionaries. However, the process of loading the dictionary (cPickle) for the current chromosome dictionary being researched takes a long time. Do you have any suggestions on how to make this process faster in the python, to look up a position on the chromosome if there is an SNV involved in that position?

Thanks in advance! Hope everyone is fine!

JC wrote:

There are multiple options:

  • Convert your SNV in a VCFs, sort and index with Tabix, you can read/call from python with pyvcf or similar packages
  • Save your data in a real database (postgres, mysql, mongodb, ...) and use a db connector in python
  • Reorganize your data, one table per chromosome is not optimal as many chromosomes are large
