Using blastn to find the similarities with short length
1
0
Entering edit mode
7.8 years ago
Ali • 0

Hi

I am comparing 1 sequence (as query ~1800nt) with a set of sequences (as target), looking for the similarities (in blast+ (blastn)).

I want to find all of the similarities that exist (even as long as 5 nucleotide). Because normally blast doesn't return similarities that short, I set e-value very large and also adjusted the max target sequences to a high number. I have also set the world size on a small number (5).

Is there any other thing that I can adjust to make sure that I have found ALL of the similarities as long as, say, 5 nucleotide?

Thank you very much.

blast sequence alignment • 2.0k views
ADD COMMENT
0
Entering edit mode

In genomes, usually the uniqueness of a sequence starts only after 17 bases. If you still want 5 base similarity, you probably should go for a script which can match patterns from a file. A basic grep should also do that

ADD REPLY
0
Entering edit mode

Thanks Rohit. I am not specifically looking into the uniqueness but thanks for your advice. I think you're right, I should try grep too.

ADD REPLY
1
Entering edit mode
7.8 years ago
natasha.sernova ★ 4.0k

See these posts:

Need a help re Blastn against short (~50 nt seq)

Blast Settings For Short Sequences

Beginner Blasting Short Sequences

and some parts of this post:

A: Blast Against Sra Dataset

I'm afraid the smallest word size is 7...

ADD COMMENT
0
Entering edit mode

Thanks Natasha. The links are very helpful. May be I should try other tools as well (as suggested in one of the links).

blast+ allows choosing a word size of >=4 while using blastn.

ADD REPLY

Login before adding your answer.

Traffic: 1412 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6