Question

Using blastn to find the similarities with short length

0

Entering edit mode

7.8 years ago

Ali • 0

Hi

I am comparing 1 sequence (as query ~1800nt) with a set of sequences (as target), looking for the similarities (in blast+ (blastn)).

I want to find all of the similarities that exist (even as long as 5 nucleotide). Because normally blast doesn't return similarities that short, I set e-value very large and also adjusted the max target sequences to a high number. I have also set the world size on a small number (5).

Is there any other thing that I can adjust to make sure that I have found ALL of the similarities as long as, say, 5 nucleotide?

Thank you very much.

blast sequence alignment • 2.0k views

ADD COMMENT • link 7.8 years ago by Ali • 0

0

Entering edit mode

In genomes, usually the uniqueness of a sequence starts only after 17 bases. If you still want 5 base similarity, you probably should go for a script which can match patterns from a file. A basic grep should also do that

ADD REPLY • link 7.8 years ago by Rohit ★ 1.5k

0

Entering edit mode

Thanks Rohit. I am not specifically looking into the uniqueness but thanks for your advice. I think you're right, I should try grep too.

ADD REPLY • link 7.8 years ago by Ali • 0

score 1 · Answer 1 · 2016-06-27

1

Entering edit mode

7.8 years ago

natasha.sernova ★ 4.0k

See these posts:

Need a help re Blastn against short (~50 nt seq)

Blast Settings For Short Sequences

Beginner Blasting Short Sequences

and some parts of this post:

A: Blast Against Sra Dataset

I'm afraid the smallest word size is 7...

ADD COMMENT • link 7.8 years ago by natasha.sernova ★ 4.0k

0

Entering edit mode

Thanks Natasha. The links are very helpful. May be I should try other tools as well (as suggested in one of the links).

blast+ allows choosing a word size of >=4 while using blastn.

ADD REPLY • link 7.8 years ago by Ali • 0