I have some doubts regarding pan-genome analyses and would be grateful if someone can explan it to me theoretically how it's done. I know there are different softwares available for this, but i am trying to do this on my own as a part of a programming excercise.
After reading some papers, i came to know that one way for computing pan-genome is to compare first species A and B then get their core genome, AB and for the next species C, compare it with AB and determine the core genes and so on a core genome curve can be plotted. Similarly, is it possible to compute a pan-genome from pairwise species blasts? I already have pairwise blast results of 20 species, can that be used to construct a pan-genome and if yes then how?
Secondly, like i am comparing 20 species, if i take one species as the reference species and get its corresponding orthologs in the rest of 19 species, would this give me the core genome with respect to that species only or can that be called the core genome for all the 20 species too?
Thirdly, the order in which the genomes are added changes the result somewhat, should i randomly pick up any order or try out different permutations and combinations and then take the average?
Lastly, is it prudent for a novice programmer to do this all by himself or should one go for a professional software if one is looking from a publication point of view?