Hi!
I'm trying to create a single phylogenetic tree from a concatenation of 10 000 individual gene alignments.
I used OMA to find 10 000 orthologous genes from 23 bacterial genomes. I then used muscle to align sequences in multifasta files of each orthologous gene set using the following:
muscle -in ${protein_file}.fa -out ${protein_file}.fasta
where protein_file is represents the multifasta files of each of the orthologous groups generated by OMA.
I then tried to input the alignments into R using the following:
library(apex) x <- read.multiFASTA(dir(pattern=".fasta"))
And was greeted with an error reading:
Error in as.matrix.DNAbin(X[[i]], ...) : DNA sequences in list not of the same length.
I tried replacing "-" in the alignments with an X. I also tried trimming the sequences with trimal:
trimal -in ${protein_file}.aln -out ${protein_file}.fasta -fasta -automated1 -resoverlap 0.75 -seqoverlap 80
(*I created the .aln files using muscle and used that as input for trimal)
I also tried variations of the trimming without the -resoverlap 0.75 or -seqoverlap 80 flags.
but I was still stuck with the same error.
I then considered just using muscle to create individual trees for each ortholog and process it with ASTRAL but I couldn't figure out how to concatenate all the trees into a single .tree file (required input for ASTRAL).
I'm tearing my hair out over what I suspect to be a trivial mistake somewhere but can't figure out where I'm tripping up. Any help would be hugely appreciated!!
Thank you in advance :)