Hello, i'd like to build a multi-gene-tree of 14 fungal genomes. I did so far:
- Analyze the protein-sequences of all genomes for shared orthologs using orthoMCL
- Filter the resulting groups/clusters for those containing (only) one ortholog of every genome
- Generate 299 separate Fasta-Files of those groups with the sequences of the orthologs
- Run a MAFFT-alignment on every one of them separately
My files look like this one:
Grph|08566 ----------------RGAFEWAGNSVGGLFCQASNIVSPSHWRMVWDVVRFNYQSIASLRAFDRASEEQ---------------------------- Rany|05125 ----------------MRIAVIGSGVSGLAATWALNEAGNSVGGLFCQASNIVAGNSVGGLFCQASNIV----------------------------------- Acai|269125 ----------------MRIEWAGNSVGGLFCQATWALNEAGNSVGGLFCQASNIVAGNSVGGLFCQASNIV----------------------------
The order of the species (e.g. Grph) in those files is varying from file to file.
My next step would be, to run RaxML.
My question is: How do I combine those 299 separated alignments to get one "multi-gene-alignment" as Input for RaxML?
Thank you for your help.
you may try concat from seqkit.
Thats been a good tip, thanks!
You can merge the MSAs 'end to end', and then with RAxML use a partition file (I think). The partition file is needed to demarcate where each alignment starts and ends.
Another option might be to compute a consensus/species tree from all of the gene trees via ASTRAL or similar.