Question: Why Does A Psi-Blast Search Produce Fewer Hits On Subsequent Iterations?
0
gravatar for s.charonis
6.2 years ago by
s.charonis70
s.charonis70 wrote:

Hello all,

I have a sequence of a GPCR (PDB code 3NY8) which I am using to create a dataset of homologs (via a homology modelling pipeline) so I can perform electrostatic calculations on them. In order to create this GPCR dataset I need an alignment which I will feed to the pipeline so that it will create one homolog per alignment member.

My problem is sequence-based: I'm doing PSI/DELTA-BLASTs on the query and I am getting some odd results. When I do the 1st iteration, I get 196 hits, but on the second I get 155. It either stays the same or keeps decreasing on each successive iteration, and I have no idea how that's happening. Does anyone have any idea how that can happen? My understanding was that PSI-BLAST searches are supposed to increase the number of hits on each successive iteration since the algorithm is using a PSSM as opposed to a sequence to detect distant members.

PARAMETERS

The parameters used to construct the alignment were as follows:

Algorithm: PSI-BLAST

Database: Non-redundant protein sequence databases (includes GenBank CDS translations, PDB, SwissProt, PIR, PRF)

Organism: Homo sapiens

Exclude: Models/uncultured sample sequences (both excluded)

Maximum target sequences: 1000

Expect threshold: 10

Word size: 3

Maximum matches in a query range: 0

Matrix: BLOSUM62

Gap Costs: Existence: 12, Extension: 1

Compositional adjustments: Composition-based statistics

Filter: Low complexity regions

PSI-BLAST Threshold: 0.005

Pseudocount: 0

Any ideas would be appreciated!

Spyros

proteomics sequence • 5.0k views
ADD COMMENTlink modified 6.2 years ago by Dan Gaston7.1k • written 6.2 years ago by s.charonis70
3
gravatar for terdon
6.2 years ago by
terdon410
terdon410 wrote:

Well, since PSI-BLAST is using a PSSM to score the hits, it is no longer depending on sequence similarity alone. In each iteration, specific sites are given more/less weight and conservation at those sites is considered more/less important in choosing the hits. This means it can find distant homologs, increasing your hits, yes. It also means, though, that in future iterations, sequences that were returned as matches the first time around will now be discarded because they lack conservation at the specific sites that the PSSM defines as important based on the previous iterations.

I have to admit that I have not actually observed this myself since I have not worked much with PSI-BLAST, but from my understanding of the algorithm it makes sense.

ADD COMMENTlink written 6.2 years ago by terdon410
1

@terdon Thank you for the input! I understand your point, but aren't sequences returned as matches in the first time around used to build the profile matrix? If their information is incorporated in the form of probabilities of occurrence, isn't that information necessary to append sequences detected in subsequent searches? In other words, once the original PSSM is created, my understanding is that it would expand until no further members can be found? Thanks again.

ADD REPLYlink written 6.2 years ago by s.charonis70
1

@s.charonis Spyro, think of a case where the PSSM built specifies a very high score for a cysteine at position 3. Of the 100 sequences used to build the PSSM, all but one have a Cys at that position. The first time around, the one sequence with another residue at that position will be taken as a hit because it satisfies the Pblast score/e-value thresholds. The 2nd iteration however could discard the sequence because it lacks the Cys that the PSSM has shown to be important. This lack could bring its score down to below the scoring threshold used to match the matrix to a hit, even though the sequence itself was used to build the PSSM. This is obviously a very simplistic example but it illustrates the point.

ADD REPLYlink written 6.2 years ago by terdon410
1

@terdon Thank you very much, that clears up a lot! I can now justify my PSI-BLAST search findings as biologically plausible.

ADD REPLYlink modified 6.2 years ago • written 6.2 years ago by s.charonis70
1

@s.charonis na 'sai kala :)

ADD REPLYlink modified 6.2 years ago • written 6.2 years ago by terdon410
1

@terdon Poly Wraios ;)

ADD REPLYlink written 6.2 years ago by s.charonis70
1
gravatar for Dan Gaston
6.2 years ago by
Dan Gaston7.1k
Canada
Dan Gaston7.1k wrote:

While you are enabling more distance matches with the use of the PSSM (for every iteration except the first), each iteration is adding sequences to your total number of cumulative hits (which you are then also adding in to the PSSM). This is how Psi-BLAST should work. Once you stop running new iterations it is all of the cumulative hits added together that count as your results.

ADD COMMENTlink written 6.2 years ago by Dan Gaston7.1k
1

@Dan Thank you, this is my understanding as well; doesn't the algorithm automatically append all newly detected members to the existing profile? In other words, shouldn't I be getting a few more hits every iteration until no more hits can be found, as opposed to going from e.g. 150, 145, 143 .. ?

ADD REPLYlink written 6.2 years ago by s.charonis70
1

The number of new hits shouldn't be going up, the total number should. Ultimately the PSSM converges and you will get no new hits at all.

ADD REPLYlink written 6.2 years ago by Dan Gaston7.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2494 users visited in the last hour