I did the pan-genome analysis, from which I got the core, accessory, and unique gene sequences. Now, I need to know specifically which are strains shared more genes among them in the accessory gene cluster. Hence, I opted for a strategy, where I firstly extracted all the gene sequences for each strain from accessory gene cluster and saved them in a single fasta file. Then I did ANI analysis, based on the ANI value shall I consider that the Top ANI value showed pairs are shared more genes among them? or should I go for blastn?
I need to know, what is the difference between ANI and blastn?
Why don't you run a cluster analysis on the
accessory gene cluster frequency table (binary matrix 1,0 aka presence,absence)to find which strains share a similaraccessory pan-genome?@andres.firrincieli I have used BPGA pipeline for my analysis, in which output does not have the following files.
accessory gene cluster frequency table (binary matrix 1,0 aka presence,absence). InBPGAI can obtaincore sequences, accessory sequences and unique sequencesas three individual files. All the strain sequences are clustered in a single individual file, that is where I am facing this problem.