Interpreting inconsistency between BLAST hits and phylogenetic clustering i
1
0
Entering edit mode
11 days ago
triplee0305 ▴ 20

Hi all,

I’m working on a gene family analysis and encountered an interesting situation that I’d appreciate your thoughts on.

I started with 8 query protein sequences from species A (all annotated as members of the same gene family) and used them to perform BLAST searches against the proteomes of species A, B, and C. This resulted in 47 hits from species A, 42 from species B, and 34 from species C. I then built a phylogenetic tree based on these 123 sequences, and as expected, the 8 original query sequences clustered into a distinct subtree comprising 31 sequences from across the three species.

Later, I discovered an additional copy in species A that shares the same gene family annotation but was not among the 47 BLAST hits. When I added this 9th sequence to the query set and re-ran the BLAST search, I still retrieved the same number of hits from species B and C, but species A now returned 49 sequences — indicating that the 9th sequence retrieved one new match (besides itself).

After rebuilding the phylogenetic tree with these 125 sequences, the 9 query sequences (now 9) again formed a distinct subtree, which now includes 33 sequences — the 2 additional ones being the newly added sequence and its unique BLAST hit.

Here’s the part I’m trying to understand:

  • If the 9th sequence is so divergent that it wasn’t picked up by BLAST when using the original 8 queries, why does it still cluster within the same gene family subtree in the phylogenetic tree?
  • Shouldn’t its divergence have caused it to fall outside that subtree?

It seems contradictory that the 9th sequence is both divergent enough to escape detection in a BLAST search, yet still similar enough to cluster tightly with the original family in a phylogenetic tree.

Thanks in advance!

tree phylogenetic blast • 470 views
ADD COMMENT
1
Entering edit mode
11 days ago
Mensur Dlakic ★ 29k

It depends on at least two factors:

  • How the alignment is done and whether the trimming is applied before tree reconstruction? If the most divergent regions are trimmed, the conserved core could be similar enough to end up on the same tree branch.
  • How divergent are other species and their proteins? If they are very far from species A, it makes sense that sequence paralogs will cluster together no matter how divergent they are individually.

Also, take a look at branch lengths. These sequences may be in the same group but still far from each other.

ADD COMMENT
0
Entering edit mode

Thank you for the response — your points make a lot of sense!

Yes, I did trim the alignment using trimAl with the -automated1 option before building the tree.

Regarding species divergence, species A diverged from species C and B approximately 35 and 45 million years ago, respectively.

I also checked the tree, and indeed, the branch leading to the two additional sequences is indeed longerthan others in the subclade (if I understand the tree correctly).

Thanks again for your insights — very helpful!

ADD REPLY

Login before adding your answer.

Traffic: 2310 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6