Hi,
I'm a bit of a noob, but I've got a list of all polymorphisms (see below as an example) between two closely related bacteria taken from Mauve. Is there a quick way I can identify which genes these polymorphisms occur in? I'm not interested in polymorphisms that occur in intergenic regions but would be ideal to just know which ones are intergenic.
SNP seq1 seq1pos seq1_genomepos seq2 seq2pos seq2_genomepos
ag Bacteria1 467 467 Bacteria1 467 467
ga Bacteria1 1057 1057 Bacteria2 1057 1057
ag Bacteria1 1977 1977 Bacteria2 1977 1977
ag Bacteria1 2347 2347 Bacterie2 2347 2347
For example, the first polymorphism 'ag' is A > G substitution in gene_001. I've tried googling and searching looking for some scripts, but can't find anything.
In addition, is there a way I can identify regions where deletion/insertions have occurred? Any perl or python scripts available?