Question: Too Many Genes Under Positive Selection
1
gravatar for Plantae
7.6 years ago by
Plantae380
Plantae380 wrote:

Our lab sequenced 7 genomes from the same organism, the divergence time between these genomes is less than 0.5 Mya.
~ 42000 ortholog gene families were constructed from these genomes,
to find positive selected genes,
I use codeml M2a vs M1a to test (df=2) for positive selection,

using a FDR cutoff of 0.01, I got ~16000 (38%) genes that are under positive selction,
does codeml suitable for our dataset (genes come from different strains of the same species)?

the control file i specified is:

seqfile = input.seq
treefile = input.tree
outfile = mlc

noisy = 3
verbose = 1
runmode = 0

seqtype = 1
CodonFreq = 2
clock = 0
model = 1 2

NSsites = 2
icode = 0
fix_omega = 0
omega = .9
fix_kappa = 0
kappa = .3
cleandata = 1``
paml codeml selection • 3.2k views
ADD COMMENTlink modified 7.6 years ago by Rahul Sharma600 • written 7.6 years ago by Plantae380

How did you obtain individual CDS from 14000 orthologs after assembling the genomes? I'm trying to do something similar using codeml, but I only know how to obtain single genes at a time. Recently I obtained genome data so I would like to obtain all CDS and apply codeml to all of them - would you mind sharing a bit of how you process such a vast number? I use bwa mem for my assembly.

ADD REPLYlink written 12 months ago by DNAngel60
4
gravatar for Giovanni M Dall'Olio
7.6 years ago by
London, UK
Giovanni M Dall'Olio27k wrote:

Have a look at the alignments. Most of the times, weird dN/dS scores are caused by errors in the alignment.

ADD COMMENTlink written 7.6 years ago by Giovanni M Dall'Olio27k
2
gravatar for Bioch'Ti
7.6 years ago by
Bioch'Ti1.0k
France (Avignon)
Bioch'Ti1.0k wrote:

Hi, I agree with Giovanni, but you should also check for paralogs... which will inflate the ratio. Good luck

ADD COMMENTlink written 7.6 years ago by Bioch'Ti1.0k
2
gravatar for aidan-budd
7.6 years ago by
aidan-budd1.9k
Germany
aidan-budd1.9k wrote:

This article by Will and Ziheng I think does a good job at exploring and highlighting common (alignment-based) sources of error for analyses of this kind. It would probably be useful for you to look through it.

ADD COMMENTlink written 7.6 years ago by aidan-budd1.9k
1
gravatar for jprmachado
7.6 years ago by
jprmachado60
jprmachado60 wrote:

Hi,

agree with both previous replies. Consider also to filter de alignment. Try Gblocks for example, but there is more tool such as Gblocks available

Good luck

ADD COMMENTlink written 7.6 years ago by jprmachado60
1
gravatar for Rahul Sharma
7.6 years ago by
Rahul Sharma600
Germany
Rahul Sharma600 wrote:

HI,

Finding positively selected genes is very tricky. As you have mentioned the site model (M2 and M1) comparison, have you tried other models M8 and M7? I would also try the Branch-site model (Test2) of codeml and later statistics with LRT, BC and FDR. Then would use only those genes, which are having at least one positively selected site with BEB confidence >95% or >99%. In my analysis, I first used the prank-codon alignments for MSA of the orthologs. Branch site model --> LRT, BC and FDR(Picked genes with 1% FDR) ---> Check the genes having atleast one site with >99% BEB site --> Finally got 6%, 4%, 3% and 2% of positively selected genes in four genomes. This paper I found very interesting: http://petrov.stanford.edu/pdfs/77.pdf

Regards, Rahul

ADD COMMENTlink modified 7.6 years ago • written 7.6 years ago by Rahul Sharma600

Sorry, but what is BC?

  Sincerely,
         Kang
ADD REPLYlink written 3.6 years ago by dukecomeback40

I asked users above to but how do you obtain all the protein-coding sequences after assembling the genome? I want to try something similar to you all with codeml, but so far I only know how to assemble my genome data using a CDS reference sequence...

ADD REPLYlink written 12 months ago by DNAngel60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1573 users visited in the last hour