How can I find the maximum percent identity between two sets of sequences?
0
0
Entering edit mode
20 months ago
dlotzk • 0

I've got two sets of unaligned AA sequences (sizes: ~25k and ~3k), varying in length from ~150 residues to ~900. I'm trying to find the minim distance between the sets in terms of percent identity; that is, I want to know what the maximum identity is between any sequence in the first set and any sequence in the second. I only need to do this a few times, so I don't need it to be terribly efficient. What's the easiest way to check this?

I thought I might be able to do it with some bash-fu and local BLASTp, but I don't know which BLAST commands would give me the result I'm looking for. My original attempt involved using a double loop over alakazam::seqDist in R, but I realized this didn't give me the right distance measure (% identity) and got stuck from there.

sequences blast clustering • 438 views
ADD COMMENT
0
Entering edit mode

Just do an all vs all blast as you planned to do (assuming local alignment is right for you) and then filter the output file (E.g. blast tabular) to find the best hits for any given query.

ADD REPLY

Login before adding your answer.

Traffic: 2427 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6