Extracting only 4-fold degenerate sites from gene sequences/alignments?
1
1
Entering edit mode
10 weeks ago
J. ▴ 40

Hi all,

For background: I am working on building a phlyogenetic tree for ~200 taxa, using published WGS sequences with genes predicted/annotated using TOGA. So far I have been using the whole coding sequence for each gene and taxon, but for our purposes it would be better to use only the 4-fold degenerate sites to ensure neutral evolution.

I was wondering if anyone can suggest any straightforward way to pull out only the 4-fold degenerate ("4d") sites either from the individual genes (pre-allignment), or from the alignment files directly?

It seems like this is fairly standard practice for a lot of phylogenomic analysis, going by the literature, but I can't see any straightforward way to do it without writing my own script (for e.g. Extracting four fold degenerate sites from nucleotide alignment) or manually selecting bases in a GUI (e.g. https://www.megasoftware.net/web_help_10/Writing_only_4-fold_degenerate_sites_to_an_output_file.htm). I feel like I am surely missing something here, as it seems like a) a fairly common task and b) very computationally simple to do, given an in-frame nucleotide sequence (or any sequence with codon annotation). A colleague suggested it is possible to output just the 4d sites from a sequence by running the genes/alignments through codeml in PAML, but I don't see any reference to such a feature in the codeml manual, unless I'm missing something.

Do I just have to bite the bullet and write my own python script (or similar), or is there an existing solution somewhere I'm not seeing?

Thanks in advance,

J.

alignments • 388 views
ADD COMMENT
3
Entering edit mode
10 weeks ago
J. ▴ 40

FYI if you also have this problem, this seems to be a pretty good solution: https://github.com/harvardinformatics/degenotate

ADD COMMENT

Login before adding your answer.

Traffic: 991 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6