Extracting only 4-fold degenerate sites from gene sequences/alignments?
Entering edit mode
16 days ago
J. ▴ 40

Hi all,

For background: I am working on building a phlyogenetic tree for ~200 taxa, using published WGS sequences with genes predicted/annotated using TOGA. So far I have been using the whole coding sequence for each gene and taxon, but for our purposes it would be better to use only the 4-fold degenerate sites to ensure neutral evolution.

I was wondering if anyone can suggest any straightforward way to pull out only the 4-fold degenerate ("4d") sites either from the individual genes (pre-allignment), or from the alignment files directly?

It seems like this is fairly standard practice for a lot of phylogenomic analysis, going by the literature, but I can't see any straightforward way to do it without writing my own script (for e.g. Extracting four fold degenerate sites from nucleotide alignment) or manually selecting bases in a GUI (e.g. https://www.megasoftware.net/web_help_10/Writing_only_4-fold_degenerate_sites_to_an_output_file.htm). I feel like I am surely missing something here, as it seems like a) a fairly common task and b) very computationally simple to do, given an in-frame nucleotide sequence (or any sequence with codon annotation). A colleague suggested it is possible to output just the 4d sites from a sequence by running the genes/alignments through codeml in PAML, but I don't see any reference to such a feature in the codeml manual, unless I'm missing something.

Do I just have to bite the bullet and write my own python script (or similar), or is there an existing solution somewhere I'm not seeing?

Thanks in advance,


alignments • 238 views
Entering edit mode
15 days ago
J. ▴ 40

FYI if you also have this problem, this seems to be a pretty good solution: https://github.com/harvardinformatics/degenotate


Login before adding your answer.

Traffic: 2654 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6