I have two sets of phylogeny-
1) Species phylogeny (in black)- Species B to D have published genomes and I have assembled a genome for Species A. I constructed the phylogeny based on multiple sequence alignment of protein orthologs across Species A to D (OrthoMCl -> MUSCLE -> trimAl -> MrBayes).
2) Subspecies phylogeny (in red) - I also have sequencing data for different subspecies and isolates of Species A. I mapped these onto Species A genome, identified SNPs (using GATK) and drew a SNP-based phylogeny.
My question now is "what is the best way to integrate both these phylogenies into one?".
I do not want to assemble the genomes for all the subspecies (tedious for 20 isolates), and I do not want to map the Species B-D reads onto Species A (They are very divergent and inferring through MSA is best I think).
I can infer nucleic acid/protein sequences of the subspecies' orthologs from variant calls and add them to the multiple sequence alignment in Species phylogeny. But I find the output of tools like vcf2fq and FastaAlternateReferenceMaker complicated -New Fasta Sequence From Reference Fasta And Variant Calls File?. In this case, how to deal with SNPs in repetitive regions that we usually exclude from analysis?
Is there any other way to achieve this?