Question: Blastclust Output Problem.
0
gravatar for Pawan_K
7.1 years ago by
Pawan_K0
Pawan_K0 wrote:

HI,

I have installed the blast standalone latest version 2.2.25 with the help of the installation guide http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/pc_setup.html. I ran the blast with a 14Mb fasta file by changing the parameters: -S 1.5 -L 0.9. But I could not get the cluster properly and I got a cluster file containing only 3 clusters and an error.log empty text file after running. Even the no. of sequences are reading while running. But When I tried online submission of some of my sequences, I got few clusters. Kindly help out the problem to get the correct clusters Using BLASTCLUST. But other different blast of the package except this are not getting correct result. Kindly help me as soon as possible.

with best regards, K. Pawankumar

blast • 2.5k views
ADD COMMENTlink modified 7.1 years ago by Suren90 • written 7.1 years ago by Pawan_K0
1
gravatar for Suren
7.0 years ago by
Suren90
Delhi
Suren90 wrote:

If you can provide the problem in little detail, it would easy to suggest the solution.

While running BLASTCLUST, If the sequence header is too long, it renames the header starting with Text like "Temp...." in the result file so make sure the header is not too long.

I hope, you know that BLASTCLUST do not produce "ready-to use" FASTA formatted file and also you understand the result output. Just for information, the number of lines denotes number of clusters. All the sequences clustered together in a cluster are described in one line. Cluster with most number of sequences will be on top and thus in decreasing order from top to down direction, if the number of sequences are equal for clusters, the alphabetical order come into effect. So before running BLASTCLUST, name your sequence header appropriately to correctly distinguish sequences falling in each cluster.

The parameters you are running is "-S 1.5 -L 0.9"

-S parameter

if < 3 then the threshold is set as a BLAST score density
(0.0 to 3.0; default = 1.75)
if >=3 then the threshold is set as a percent of identical
residues (3 to 100)

Try strict or relaxed percent identity / score density to see the change in the number of cluster returned.

You can also use CD-HIT and USEARCH for clustering sequences. Both are must faster than BLASTCLUST.

ADD COMMENTlink written 7.0 years ago by Suren90
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 725 users visited in the last hour