Calculating genetic distance when some sites are ambiguous (heterozygotes)
1
0
Entering edit mode
24 months ago
emacdoug • 0

I'm constructing a matrix of pairwise genetic distance based on Sanger sequence data. The samples are diploid with many variable sites, so there are a bunch of legitimate ambiguous characters (R, M, W etc) in the DNA sequences. I'd like to calculate distances making use of this information, such that [for example] AAA and ARA have a pairwise distance greater than zero but less than the pairwise distance between AAA and AGA. That is, it makes sense to me that heterozygotes should have intermediate genetic distance between both types of homozygote.

I've tried dist.alignment in seqinR and dist.dna in ape, but they both seem to be dropping the ambiguous characters as missing data. Ideas on how I can fix this, or other commands/packages to try, would be so welcome!!

heterozygous distance ape R genetic • 751 views
ADD COMMENT
0
Entering edit mode
13 months ago

How about using MATCHSTATES distances (or GENPOFAD) in the package pofadinr by Joly et al. (https://github.com/simjoly/pofadinr) ? That will provide a distance when using ambiguity codes.

library(ape)
library(pofadinr)

alignment <- read.FASTA("input.fasta", type = "DNA")

# convert unknown bases to "?"
temp <- as.character(alignment)
temp[temp == "n"] <- "?"
alignment <- as.DNAbin(temp)

distances <- dist.snp(alignment, model = "MATCHSTATES")
ADD COMMENT

Login before adding your answer.

Traffic: 1523 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6