Hello everyone,
I am currently searching for signs of positiv selection among a subset of nuclear genes (n= 44), identified as interacting with plastid gene product, in order to search for potential compensatory mutation in those genes. I am working on RNAseq data and I used PAML to search for potentiel sites showing signs of positive selection. To do so, as input I generated multiple sequence alignement (MSA) - 1 per gene (so 44 MSA), containing the gene sequence of 22 individuals coming from 4 very closely related evolutionary lineages of Silene nutans and one outgroup (Silene paradoxa). Those input files are in PHYLIP Interleaved format. And I also used trees in Newick format as input. I ran the M1 & M2 models to calculate LTR etc and I identified 18 genes containing sites showing signs of positive selection and with a significant LTR. But, when I am comparing the sites identified as positively selected sites, 1) the identified sites do not match with the sites on the MSAs (e.g. if the sites showing signs of positive selection is 177 Y in the output file of PAML, on the MSA at the same position it is not a Y but a R) and 2) often, those residues (showing signs of positiv selection) are not even variable (i.e. containing fixed non-synonymous substitutions between lineages). So I don't understand why the PAML results don't match with my data, I don't know what I did wrong or what I should have done to avoid this... So I am asking for any ideas or suggestions explaining those results.
I just pasted down here one linked leading to the codeml control file, one of my MSA for a gene identified as containing sites with signs of positiv selection and the corresponding tree.
https://drive.google.com/open?id=1EZug3Lzd_gJNVfQGssbdmqXcSiX2Vmuv
Thank you for time and don't hesitate to ask more detailed.
Thank you again !
ZoƩ