Question: NJ phylogenetic tree calculation method weighing gaps in the alignment
1
gravatar for Kame
4.1 years ago by
Kame20
UK
Kame20 wrote:

Hello,

Is there any NJ phylogenetic tree calculation method that would work on large genomic alignments and which takes into account/weight the gaps in the alignment?

I have managed to do it with FastTree, which does it exactly as I would like (from their website: "When comparing two sequences, positions with gaps are ignored; when comparing two profiles, positions are weighted by their proportions of non-gaps.") but I was wondering if there was a NJ method to do it as well. I use RapidNJ, which doesn't seem to be able to do it.

There is/was an option in MEGA to do it, by defining the gap threshold, i.e. the percentage of gaps required for each site to be considered, but I'd prefer a command-line based method as MEGA does not handle well large alignment files.

Thanks a lot for your help

K.

 

tree alignment nj gaps • 1.7k views
ADD COMMENTlink modified 4.1 years ago by Brice Sarver2.6k • written 4.1 years ago by Kame20
0
gravatar for Brice Sarver
4.1 years ago by
Brice Sarver2.6k
United States
Brice Sarver2.6k wrote:

One way I've tackled this issue is by reading large alignments into R using Bioconductor's Biobiostrings, using MultipleAlignment objects, specifically. You can mask certain bases that fall below a certain threshold using the maskGaps() function. Since you have the alignment already, any regions with a large percentage of gaps would be masked. When you write the object back out, these sites are removed. You'll be able to estimate your tree using whatever approach you used previously.

Obviously, the efficacy of this approach depends on how much data you actually have. readDNAStringSet() will read an entire mouse genome into memory in < 10 seconds. The alternative, which wouldn't be too difficult but perhaps slower, would be to use Python (or perl or a language of your choice) to process a file site-by-site and remove sites that don't meet your criteria.

ADD COMMENTlink written 4.1 years ago by Brice Sarver2.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1687 users visited in the last hour