Question: Multigene-Phylogeny: How to merge several Multiple-Sequence-Alignments?
18 months ago
Tom.Ma wrote:

Hello, i'd like to build a multi-gene-tree of 14 fungal genomes. I did so far:

  • Analyze the protein-sequences of all genomes for shared orthologs using orthoMCL
  • Filter the resulting groups/clusters for those containing (only) one ortholog of every genome
  • Generate 299 separate Fasta-Files of those groups with the sequences of the orthologs
  • Run a MAFFT-alignment on every one of them separately

My files look like this one:

Grph|08566 ----------------RGAFEWAGNSVGGLFCQASNIVSPSHWRMVWDVVRFNYQSIASLRAFDRASEEQ---------------------------- Rany|05125 ----------------MRIAVIGSGVSGLAATWALNEAGNSVGGLFCQASNIVAGNSVGGLFCQASNIV----------------------------------- Acai|269125 ----------------MRIEWAGNSVGGLFCQATWALNEAGNSVGGLFCQASNIVAGNSVGGLFCQASNIV----------------------------

The order of the species (e.g. Grph) in those files is varying from file to file.

My next step would be, to run RaxML.

My question is: How do I combine those 299 separated alignments to get one "multi-gene-alignment" as Input for RaxML?

Thank you for your help.

mafft msa raxml alignment fasta
written 18 months ago by Tom.Ma

you may try concat from seqkit.

written 18 months ago by liupfskygre

Thats been a good tip, thanks!

written 18 months ago by Tom.Ma

You can merge the MSAs 'end to end', and then with RAxML use a partition file (I think). The partition file is needed to demarcate where each alignment starts and ends.

Another option might be to compute a consensus/species tree from all of the gene trees via ASTRAL or similar.

written 18 months ago by Joe
17 months ago
Brice Sarver3.5k
United States
Brice Sarver wrote:

I provided this solution for another question, but it will work just as well for combining MSAs: A: merge two multifasta files

written 17 months ago by Brice Sarver
