I have 400 species and 57 marker genes, such that my genomes have missing data. On average each genome has only 75% of the marker genes (missing data is entirely randomly distributed), i.e. each of the marker genes are found in 300 species on average.
I want to infer phylogeny between these 400 species using all of these marker genes (i.e. with missing data) using two tools, namely RAxML and MrBayes. What I did is I constructed individual multiple sequence alignments for each of the 57 markers, so now I have 57 multiple sequence alignments and then concatinated these alignments next to each other. For each genome missing a gene I simpy put dashes during concatiation for that genome for that marker gene and continued.
My question is how do I issue the command using RAxML to infer phylogeny over this dataset using this multiple sequence alignment? I was reading over he documentation and some threads online that I need to use the “Partitioned models” parameter for RAxML and tell the program that these sequences are concatinated. I did not understand how I add this parameter, my understanding is that I also need to specify a file that identifies my paritions? and I need to tell RAxML to use different evolutionary models for each partition? this is the bit that is confusing me. Right now I just issued the following command to infer phylogeny:
raxmlHPC -m PROTGAMMAAUTO -s pfamNCBIGeneSeqs_400_MSA_withMissingData.fasta -n pfamNCBIGeneSeqs_400_MSA_withMissingData_raxml.tree -T 50 -p 12345
Is there another parameter that I should add for this scenario? Also what is the equivalent command using MrBayes to infer phylogeny over the same data?
Thank you for your time!