Question: blastcust results understanding
gravatar for juan.crescente
3.6 years ago by
juan.crescente40 wrote:

I have to group DNA sequences according to similarity, and create a non-redundant (NR) database from it.

In the first attempt I start creating the NR database with the first sequence, and created a database with already added sequences (redundant). Before adding the next sequence, I did a BLAST against it to check whether the new sequence already exists in the database. This gave me 52 results from a total of 85.

blastn -db dna.fasta.db -query temp.fasta -evalue 1e-3 -max_target_seqs 1 -outfmt '6 qseqid sseqid sstart send evalue'

If this has a result, the sequence is ignored.

On the second attempt I used blastclust. As I've read, I should get the same result. I used the same e-value in the config file

-e 1e-3

With this command, but I obtained 71 clusters (I expected 52) from a total of 85 sequences.

blastclust -i known.numbered.fasta -o known.numbered.fasta.cluster -p F -c config

Am I missing anything from balstclust? Documentation is very vague.

blast blastclust • 949 views
ADD COMMENTlink modified 3.5 years ago by Biostar ♦♦ 20 • written 3.6 years ago by juan.crescente40

I think blastclust has length coverage threshold (default = 0.9).

ADD REPLYlink written 3.6 years ago by fishgolden450

tried with that but the same

ADD REPLYlink written 3.6 years ago by juan.crescente40

The same is also strange. I expected that the number of clusters might not be 52 but would be less than 71 if we remove length coverage threshold. Anyway, I have no idea except for the extra options of blastclust.

ADD REPLYlink written 3.6 years ago by fishgolden450
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2082 users visited in the last hour