Question: Find protein homologs with BLASTp
0
gravatar for biotech
5.9 years ago by
biotech540
United States
biotech540 wrote:

I'm trying to find homologs of a set of proteins using BLASTp. I'm working with custom databases.

I'm using evalue of 0.00001 as threshold. 

I would like to filter queries having hits with >90% identities. Because BLASTp output is based on HSPs, I cannot filter by %identities/query, only by HSP.

I would like to know how to do this and also if I'm following a reasonable strategy.

Thanks, Bernardo

# Here is an alignment example: qcovs=100 but qcovhsp lower.

qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore qcovs qcovhsp
HPNK_01698      HAPS_0519       81.88   596     75      5       630     1225    615     1177    0.0      889    100     49
HPNK_01698      HAPS_0519       49.17   301     115     8       84      366     201     481     2e-56    214    100     23
HPNK_01698      HAPS_0519       53.64   261     61      6       436     684     616     828     6e-49    191    100     20
HPNK_01698      HAPS_0519       46.61   251     62      3       332     510     584     834     6e-46    181    100     15
HPNK_01698      HAPS_0519       53.27   214     79      4       1       194     1       213     1e-45    180    100     16
HPNK_01698      HAPS_0519       55.96   218     60      8       550     764     643     827     1e-40    164    100     18
HPNK_01698      HAPS_0519       51.56   225     61      7       516     731     642     827     1e-38    157    100     18
HPNK_01698      HAPS_0519       49.57   230     77      6       484     713     643     833     1e-37    154    100     19
HPNK_01698      HAPS_0519       57.89   76      26      1       364     433     760     835     1e-13   76.3    100     6

   

#code used
$makeblastdb -in $Hparasuisfastadatabase -out H_parasuis_strains_gb_ALL.fna_databaseBLAST -dbtype prot -parse_seqids # blast database
$blastp -db H_parasuis_strains_gb_ALL.fna_databaseBLAST -query 'out_2.fasta' -out HPNK_selected_vs_H_parasuis_strainss.tblastn -evalue 0.00001 -outfmt "6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore qcovs qcovhsp" -max_target_seqs 50
blast homology • 2.7k views
ADD COMMENTlink modified 5.3 years ago by Biostar ♦♦ 20 • written 5.9 years ago by biotech540
1

Hello Bernardo Bello !

It appears that your post has been cross-posted to another site: http://biology.stackexchange.com/questions/23953/find-protein-homologs-with-blastp

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLYlink written 5.9 years ago by RamRS30k
1
gravatar for Siva
5.9 years ago by
Siva1.7k
United States
Siva1.7k wrote:

As 5heikki pointed out correctly, you cannot get what you are looking for directly from the BLAST output. If you really want to sort the BLAST hits by "global similarity" and select only top scoring hits, there is a work around using BLAST score which corresponds to similarity.

1. If there is only one HSP for a subject/hit, it is the score for that subject sequence. 

2. If there are more more than one HSP for a subject:

2a. For non-overlapping HSPs: Add the scores of all non-overlapping HSPs

2b. For overlapping HSPs: Take the sum or the maximum score of the overlapping HSPs

3. Sort the subjects based on score and select high scoring subjects with whatever threshold you set.

 

ADD COMMENTlink modified 5.9 years ago • written 5.9 years ago by Siva1.7k
0
gravatar for 5heikki
5.9 years ago by
5heikki9.0k
Finland
5heikki9.0k wrote:

You mean you want percent identities of global alignments? If yes, Basic Local Alignment Search Tool is not the tool you're looking for. Also, beware that e-value depends on database size..

ADD COMMENTlink modified 5.9 years ago • written 5.9 years ago by 5heikki9.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1692 users visited in the last hour