Question

ggsearch: not returning all alignments for short sequences

0

Entering edit mode

4.6 years ago

felix.teufel • 0

Hi. I'm using ggsearch36 from the FASTA package to create a similarity matrix of protein sequences, using global identity as the similarity. I realized that for some query sequences, ggsearch36 does not print all alignments to the library, and I have a hard time figuring out what parameter can be used to fix this.

For query

>Query1
MCPRAARAPATLLLALGAVLWPAAGAWELTILHTNDVHSRLEQTSEDSSKCVNASRCMGGVARLFTKVQQ

the command

fasta36/bin/ggsearch36 -E 20758 testqry.tmp library.fasta

works fine, printing all alignments. My e value is the size of the library.

Statistics:  Unscaled normal statistics: mu= -27.8916  var=246.9351 Ztrim: 0
statistics sampled from 1375 (1376) to 1375 sequences
Algorithm: Global/Global affine Needleman-Wunsch (SSE2, Michael Farrar 2010) (6.0 April 2007)
Parameters: BL50 matrix (15:-5), open/ext: -10/-2

Whereas for query

>Query2
MKVVIFIFALLATICAAFAYVPLPNVPQPGRRPFPTFPGQGPFNPKIKWPQGY

The same command only returns 12 aligments.

Statistics: (shuffled [100]) Unscaled normal statistics: mu= -29.4000  var=283.6970 Ztrim: 0
statistics sampled from 12 (12) to 100 sequences
Algorithm: Global/Global affine Needleman-Wunsch (SSE2, Michael Farrar 2010) (6.0 April 2007)
Parameters: BL50 matrix (15:-5), open/ext: -10/-2

I can't figure out what causes this behaviour. Increasing the e value did not increase the number of printed alignments. The only difference that is obvious to me is that query 1 has a length of 70, whereas query 2 is only 53 amino acids. I couldn't find anything related to sequence length in the configuration though.

Any ideas what the problem might be? Thanks for your help.

alignment • 912 views

ADD COMMENT • link updated 4.5 years ago by Biostar 20 • written 4.6 years ago by felix.teufel • 0