Interpreting jmodeltest2 using transposon sequences results
1
0
Entering edit mode
6.9 years ago
mcsimenc ▴ 20

Dear Biostars community,

I want to estimate divergence between two long terminal repeats from a single LTR retrotransposon, which are assumed to be identical when a new element is inserted. I have estimated substitutions per site this using baseml, selecting a model of sequence evolution more or less at random, but I want to justify which model of sequence evolution to use. My approach has been:

  1. Infer a tree from a set of presumably related LTR retrotransposons using protein coding domains (e.g. reverse transcriptase, integrase)

  2. Select a monophyletic group of LTR retrotransposons and make an alignment of the long terminal repeats

  3. Graft the long terminal repeats onto the terminal taxa in the tree inferred in step 1

  4. run jModeltest2 using the alignment from step 2 and tree from step 3

I did this and got a model that was most highly supported, but when I ran jModeltest2 again using the tree from step 1 and an alignment of the protein coding domain used in the inference of that tree, I get different most highly supported models. My thought is that the model estimated as best from the tree and alignment of LTRs is the one I should use, but I am unsure if there is something I'm missing. Maybe I'm going about this the wrong way. Any insights or comments are appreciated, thank you!

jmodeltest2 transposon long terminal repeats model • 1.3k views
ADD COMMENT
0
Entering edit mode
3.8 years ago

Hi, I'am a newer using Baseml to calculate the substitutions per site, but there are some problems when analysing the results. could you give me some help. my control file are: baseml.ctl:
seqfile =seq.fas-gb treefile = enhancertree.txt

  outfile = mlb       * main result file
    noisy = 3   * 0,1,2,3: how much rubbish on the screen
  verbose = 0   * 1: detailed output, 0: concise output
  runmode = 0   * 0: user tree;  1: semi-automatic;  2: automatic
                * 3: StepwiseAddition; (4,5):PerturbationNNI 

   model = 6   * 0:JC69, 1:K80, 2:F81, 3:F84, 4:HKY85
                * 5:T92, 6:TN93, 7:REV, 8:UNREST, 9:REVu; 10:UNRESTu

    Mgene = 0   * 0:rates, 1:separate; 2:diff pi, 3:diff kapa, 4:all diff

    ndata = 1
    clock = 2   * 0:no clock, 1:clock; 2:local clock; 3:CombinedAnalysis;An rooted tree should be used under 1,2,3 model. 
fix_kappa = 0   * 0: estimate kappa; 1: fix kappa at value below; 2: kappa for branches
    kappa = 2.5   * initial or fixed kappa

fix_alpha = 0   * 0: estimate alpha; 1: fix alpha at value below
    alpha = 0.5   * initial or fixed alpha, 0:infinity (constant rate)
   Malpha = 0   * 1: different alpha's for genes, 0: one alpha
    ncatG = 8   * # of categories in the dG, AdG, or nparK models of rates
    nparK = 0   * rate-class models. 1:rK, 2:rK&fK, 3:rK&MK(1/K), 4:rK&MK 

    nhomo = 1   * 0 & 1: homogeneous, 2: kappa for branches, 3: N1, 4: N2
    getSE = 0   * 0: don't want them, 1: want S.E.s of estimates

RateAncestor = 1 * (0,1,2): rates (alpha>0) or ancestral states

Small_Diff = 7e-6 cleandata = 1 * remove sites with ambiguity data (1:yes, 0:no)? * icode = 0 * (with RateAncestor=1. try "GC" in data,model=4,Mgene=4) * fix_blength = 0 * 0: ignore, -1: random, 1: initial, 2: fixed, 3: proportional method = 0 * Optimization method 0: simultaneous; 1: one branch a time

and my results are showing as: (1)Homogeneity statistic: X2 = 0.18360 G = 0.18650

Average 0.30496 0.19828 0.26761 0.22915

constant sites: 105 (27.63%)

ln Lmax (unconstrained) = -1678.510665

Distances: TN93 (kappa) (alpha set at 0.50) This matrix is not used in later m.l. analysis. (2) Detailed output identifying parameters: rates for branches: 1 0.04432rate (kappa or abcde) under TN93: 3.10512 3.05480 Base frequencies: 0.24163 0.24431 0.24245 0.27161 alpha (gamma, K=8) = 3.31196 rate: 0.30768 0.51640 0.67190 0.82302 0.98635 1.18100 1.44935 2.06429 freq: 0.12500 0.12500 0.12500 0.12500 0.12500 0.12500 0.12500 0.12500

If my ctl file is right, and is there something wrong in my out file? if the results, how could I calculated the substitution rate according to the information.

Thanks in advance.

ADD COMMENT

Login before adding your answer.

Traffic: 2263 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6