I am CS major with little biology background. I am doing some experiments, and at some point, I need to simulate DNA alignments along a phylogenetic tree (which is produced under Yule process using r8s program). I was able to produce "some" sequence using Seq-Gen, Seq-Gen manual, with the following parameters from that tree:
seq-gen -o p -m GTR -i 0.01 -f 0.3 0.2 0.2 0.3 -s 0.5 -z <some_seed> -l 2000
The problem is that when I use RAxML to obtain a maximum likelihood tree from this sequence, the resulting tree is very different from the one from which the sequence data was generated, namely their RF distance is almost maximum possible. I suspect there is something wrong with my parameters of Seq-Gen. Probably they are not biologically sensible, or I need to use more parameters like -r option to specify substitution rates? (If that matters, I used GTRGAMMA model for RAxML)
Somehow I realized when I change branch lengths of the tree, the problem is mitigated. So my #0 question is how to set branch lengths of a (say ultrametric) phylogeny such that it biologically makes sense? Is there a standard way of doing this? Any software is available for this?