Concatenate multiple MUSCLE alignment result to build a phylogenetic tree
1
0
Entering edit mode
5.2 years ago
huiyus97 • 0

Hi,

I want to build a phylogenetic tree based on single copy orthologous proteins which I got using OrthoMCL. I aligned each protein group using MUSCLE separately and I want to concatenate those proteins to build a phylogenetic tree using RAXmL. Could someone please recommend me some tools which I can use to concatenate those protein sequences, or can I do it with RAXmL?

Thank you so much for your help!

alignment protein muscle • 4.9k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
1
Entering edit mode
5.2 years ago
Mensur Dlakic ★ 30k

This script works well. I suggest you trim the alignments before concatenation - see here and here.

ADD COMMENT
0
Entering edit mode

Hi Mensur, sorry for this but I'm a little stuck.

I’m trying to build a species tree for the first time and I would like to clarify a few doubts regarding sequence labels and the workflow I’m following.

Data In my Single_Copy_Orthologue_Sequences folder I have files like:

ls
N0.HOG0000162.fa
N0.HOG0000271.fa

Example content of N0.HOG0000162.fa:

>AT3G02650.1|PACid_19663616
MLRSFLCRSQNASRNLAVTRISKKKTQTTHSLTSLSRFSYLESSGNASVRNIRFFSTSPPTEENPVSLPADEIPISSAAE...
>evm_27.model.AmTr_v1.0_scaffold00066.198
MWRYSLLRASSIRSQWLNRANPKTLASTSALSSCLEVYTNHRKNHGNPSFMSRESHSVAETSSYDGGNPSFSSNVSDGSS...

The first header corresponds to Arabidopsis The second header corresponds to Amborella

My workflow was:

Step 1: Orthofinder
orthofinder -f ./prot_longest -t 30 -o orthofinder


Step 2: mafft 
mafft --auto --thread 30 "$file" > "$output_file"


Step 3: trimal
trimal -in "$file" -out "$output_file" -automated1- -fasta -htmlout "$output_file"


Step 3: Concat
Concat: https://github.com/nylander/catfasta2phyml
$CATFASTA --concatenate ${ALIGN_DIR}/*_trim.fa > $SUPERMATRIX 2> $PARTITIONS

# Clean file names and prepare partitions for IQ-TREE/RAxML (protein)
sed -i -e "s#${ALIGN_DIR}/##" -e "s/_trim.fa//" -e "s/^/PROT, /" $PARTITIONS

Example output of supermatrix: head supermatrix.phy

10 893113
AT3G02650.1|PACid_19663616    MLRSFLCRSQNASRNL...
evm_27.model.AmTr_v1.0_scaffold00066.198  MWRYSLLRASSIR...

Partition file (partitions.txt):

PROT, N0.HOG0000162 = 1-560
PROT, N0.HOG0000271 = 561-1440
PROT, N0.HOG0000277 = 1441-2422

Questions:

I notice that I don’t have clear species labels in my headers, only sequence IDs.

In the resulting species tree, how will my species be labeled?

Will IQ-TREE use these IDs as taxon names?

Am I missing a step if I want readable species names (like “Amborella”) in the final tree?

I was thinking of running IQ-TREE like this:

iqtree -s supermatrix.phy -m MFP -bb 1000 -alrt 1000

Is this correct for a first species tree?

Should I consider using partitions (-spp partitions.txt) here?

Thank you very much for your help!

ADD REPLY

Login before adding your answer.

Traffic: 4491 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6