[R] How to use the consistency index calculation (phangorn) on multi FASTA alignments?
1
1
Entering edit mode
6.6 years ago
Kame ▴ 20

Hello,

Sorry in advance if my problem is very simple, I am only starting using R! :) Thanks a lot for your help in advance.

I have a bunch of DNA sequence alignments in FASTA format (~150 sequences, aligned, sometimes with gaps) and a corresponding phylogeny that I would like to use to calculate consistency indices (CI) in order to have a rough idea for homoplasy in various sequences of interest.

For this, I saw that I can use the CI() function from the R package phangorn. Below is what I do, but there is a problem as the R session aborts (bomb) as soon as I start running the last line below, either on my local machine or a large cluster with lots of RAM:

seqdata = read.FASTA(file="multifasta.fasta")
seqdataPhy = phyDat(seqdata, type="DNA")
phyloTree = read.tree(file="tree.nwk")
CI(phyloTree, seqdataPhy)

I suspect something goes wrong when I try to create the sequence alignment data file in phyDat format (maybe I am forgetting some options?), but I haven't found any documentation on what I would like to do, so I try here. Thanks a lot, I am grateful for any advice!

All the best,

KS

R homoplasy phangorn • 2.5k views
ADD COMMENT
0
Entering edit mode

could you share your files?

ADD REPLY
0
Entering edit mode
6.6 years ago
Kame ▴ 20

Hi. I eventually found out that there was an inconcistency in the leaf names of my corresponding tree (rapidnj, that was used for this, adds single inverted commas around leaf names, which made CI() crash).

ADD COMMENT

Login before adding your answer.

Traffic: 1873 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6