I have sequences obtained by Sanger sequencing on metagenomic DNA that contain ambiguous bases (due to amplification of two or more highly homologous sequences).
I want to look up for best possible matches from database. BLAST finds matches but treats ambiguous bases as mismatch.
I plan to take the sequences found by BLAST (mySeq -> BLASTn -> all.BLAST.seqs) and align them back to my sequence to find the best possible match/es
taking into account the meaning of ambiguous positions.
I think it can be performed by multiple alignment in R with the
DECIPHER package to set of all possible sequences generated by
It would look something like:
library(DECIPHER) permutations <- Disambiguate(mySeq) #expands ambiguities into all permutations - quite a lot! all.BLAST.seqs <- OrientNucleotides(all.BLAST.seqs) #to put the seqs found by Blast in the same orientation aligned <- AlignSeqs(c(all.BLAST.seqs, permutations)) d <- DistanceMatrix(aligned) # calculate a maximum likelihood tree dend <- IdClusters(d, type="dendrogram", myXStringSet=aligned) dend <- dendrapply(dend) #to create phylogenetic tree to look for the closest matches to mySeq
However this I believe might not be the best way. Especially as
disambiguate may produce quite a lot of sequences.
Does anybody knows better, faster way to find best matching sequences taking into account information from ambiguous positions?