Recently I am dealing with bunch of genes to design the appropriate primers.
However, it is still hard for me to obtain the homology information of the primers.
For example, I need to design a pair primers for one exon of the gene.
I firstly get all possible primers with predefined length, e.g. 18-30 bps, and then use
blastall -p blastn (or
-e 1 -W 8 to determine whether the primers have homogenous seqs. However, for those >10000 primers, the blast out file was larger than 200M, which requires longer time to parse using Bio::SearchIO module. And sometimes even crash the memory. Moreover, blasting those primer seqs within 18-30 bps are danger because shorter seqs will sometimes fail due to unkown reasons.
Another method is to blast the whole exon regions with parameter
-e 0.1 -W 11, however, it will generate huge output and it will take long time to parse the blast file, and to determine whether the primer region falls into homologous part.
Till now, I have not obtained any good method to fix such problem.
If anyone experienced such issue, can you plz tell me how?
Although we could firstly define those nts belong to repeat regions using repeatMasker,
and then use -F parameter in blast to neglect these regions, those repeat regions, however, will sometimes do not share too much homologous sequences.
This is the method that I can find now, but is not perfect.
Hope someone could provide some suggestions to better improve the results.