Hey peeps !

im very new to phylogenetics and currently trying to reconstruct a phylogenetic tree and getting started with all the different software and tools. first i would like to reconstruct the phylogenetic tree from 138 cyanobacterial strains / 8 loci per strain (done by the original authors of the study) and second i would like to implement another 4 strains in the tree where i got all loci except the 16s-its.

heres a quote from the authors methods :

"The sequences of the 16S rDNA, 16S rDNA-ITS, PC-IGS, PSA-IGS, RNaseP, rbcLX-IGS, and rpoC were concatenated resulting in 2,697 bp. Ambiguous sites (n = 93) were removed from the sequence alignment when approximating a continuous gamma distribution (ncatG = 5): alpha (gamma, K = 5) = 0.01712, Average Ts/Tv = 2.5996. Phylogenetic trees were constructed using (i) maximum likelihood (ML), (ii) neighbour-joining (NJ) from the nucleotide sequences distance matrix (calculated using the F84 substitution model), and (iii) maximum parsimony (MP) from nucleotide sequences using the PHYLIP package .Statistical significance of the branches was estimated by bootstrap analysis generating 1000 replicates of the original data set using the PHYLIP package. Finally, consensus trees following the 50% majority rule were computed."

first i would like to understand how (which tool i need etc) to remove the ambigous data like mentioned above. how do i calculate ncatg , alpha , Ts/tv from a dataset myself to understand the authors choice.

second i would like to know if i should change any of the above parameters or the model used when adding the 4 additional strains which are lacking the 16S-IGS locus.will the additional 4 strains probably have little effect on this and will the missing 16S-locus be problematic ? the strains all belong to a well established monophyletic genus

cheers

