Question: How To Find 50 Homolgous Sequences But Not So Close Related?
0
gravatar for onpelikan
6.2 years ago by
onpelikan0
onpelikan0 wrote:

Hi, I'm searching for e.g. 50 sequences in Not redudundat blast database. I want to test program for protein mutation prediction - program tries to estimate if mutation is deleterious or neutral.

Example of analyzed sequence is well known lacI repressor. Blast finds lot of sequences but too much similar. First 50 sequences are almost the same and prediction program has no heterogentity for it's prediction model.

How to find homogous sequences but not the same (I want orthologs). E. g. sequences from another species and little bit different than human LacI protein.

I tried classic blastp. Another way I tried: first run blastp for 2000 sequences and then align these sequences and this alignment get to psiblast as PSSM (-in_msa parameter). Is there other automatic way or parameter settings for Blast+ package to find more distant sequences?

EDIT: Constraint - searching process have to be automatic. It is one of the component of a bigger tool.

blast • 2.2k views
ADD COMMENTlink modified 6.1 years ago by Spitshine640 • written 6.2 years ago by onpelikan0

I would guess you need to define some sort of constraints - i.e. (1) bitscore thresholds, (2) species subset (or a distance) and (3) conserved domain(s), and then see which blast hits will satisfy these.

ADD REPLYlink written 6.2 years ago by Pavel Senin1.9k
2
gravatar for 5heikki
6.2 years ago by
5heikki8.6k
Finland
5heikki8.6k wrote:

You could filter tabular blast output with e.g. awk to only include hits that have smaller than whatever similarity percentage:

awk '$3 <= 95 {print}' tabularBlastOutputFile | awk '$3 >= 85 {print}' > hitsBetween85And95SimilarityPercentage
ADD COMMENTlink modified 9 weeks ago by RamRS25k • written 6.2 years ago by 5heikki8.6k
2
gravatar for Manu Prestat
6.2 years ago by
Manu Prestat3.9k
Lyon, France
Manu Prestat3.9k wrote:

You're looking for a search with an improved sensitivity. Try a profile-based search, e.g. HMMer with pfam.

ADD COMMENTlink written 6.2 years ago by Manu Prestat3.9k

HMMer returns lot of sequences so I clustered it with cd-hit and this process got the best results for mutation analysis with MAPP program.

ADD REPLYlink modified 6.1 years ago • written 6.1 years ago by onpelikan0
1
gravatar for jackuser1979
6.2 years ago by
jackuser1979870
US
jackuser1979870 wrote:

You can do with BLASTO blast designed for orthologue search. Try search in eggNOG database or DRSC tool.

ADD COMMENTlink written 6.2 years ago by jackuser1979870

Thank you. This is really interesting projects/tools but I need command line program (such as blast+ programs).

ADD REPLYlink written 6.1 years ago by onpelikan0

Is there please any way to download all sequences in fasta? I can't see anything.

ADD REPLYlink modified 6.1 years ago • written 6.1 years ago by onpelikan0
1
gravatar for Asaf
6.2 years ago by
Asaf7.0k
Israel
Asaf7.0k wrote:

You can run PSI-BLAST and choose the proteins you get in the second or third iteration.

ADD COMMENTlink written 6.2 years ago by Asaf7.0k
1

And by the way, your question reminds me of the construction of BLOSUM, maybe you'll find interesting insights in the original paper.

ADD REPLYlink written 6.2 years ago by Asaf7.0k

This is another good advice.

ADD REPLYlink written 6.2 years ago by Manu Prestat3.9k

1) I need the blast to be automatic process without manual work.

2) I will check the original paper. Thank you.

ADD REPLYlink modified 6.1 years ago • written 6.1 years ago by onpelikan0
1
gravatar for Spitshine
6.1 years ago by
Spitshine640
Esch-sur-Alzette, Luxembourg
Spitshine640 wrote:

If you do not want to rely on an orthologous groups database, modify your input set to include diverse sequences by cd-hit (http://weizhong-lab.ucsd.edu/cd-hit/).

This is how protein families were built in the olden days of biocomputing.

ADD COMMENTlink modified 6.1 years ago • written 6.1 years ago by Spitshine640

This is probably one of the best solution. One possible is let blastp search e.g. 3000 sequences and then obtain 50 representative sequences from cd-hit clustering .

ADD REPLYlink written 6.1 years ago by onpelikan0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 917 users visited in the last hour