Clustering tools for sequence alignments?
2
0
Entering edit mode
7.7 years ago
caity • 0

Hello,

I'm wondering if anyone knows whether there is a tool for clustering sequences that have already been aligned? Something that doesn't mind the unaligned hyphens and takes into consideration gaps "----AGSM----". Like usearch, something that might be able to specify the percentage identiy cutoff for the clustering, as well as the percentage of sequence alignment length (so it can cater for substrings). I have a concatenated alignment of marker genes that I am wanting to reduce to a representative set (based on 95% aai).

Thanks!

alignment • 4.1k views
ADD COMMENT
1
Entering edit mode

Just a comment - it's helpful to others to specify "Amino Acid" or "Nucleotide" in the title, as well as "Multiple Sequence Alignments" rather than just "Alignments".

ADD REPLY
4
Entering edit mode
7.7 years ago
Erik Wright ▴ 420

Clustering an aligned set of sequences can easily be performed in R using the DECIPHER package:

library(DECIPHER)
aa <- readAAStringSet("<<PATH TO ALIGNMENT>>")
d <- DistanceMatrix(aa)
c <- IdClusters(d, method="complete", cutoff=0.05)
head(c)

This will give you cluster numbers at 95% identity on a complete-linkage tree. You can also specify other methods like "UPGMA".

I hope that helps!

ADD COMMENT
1
Entering edit mode
7.7 years ago

Clustal Omega can take a multiple sequence alignment as input and output clusters.

EDIT: You can also output the distance matrix or pairwise identity matrix and use them for clustering using different algorithms. Check the docs.

ADD COMMENT

Login before adding your answer.

Traffic: 1879 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6