Question

Need a help re Blastn against short (~50 nt seq)

0

Entering edit mode

10.0 years ago

ritujoshi18 • 0

Hi Everyone,

In stand-alone blast, I'm blasting several thousands of ~450nt long sequences against one short ~50nt long motif sequence to find the sequences that has this ~50nt motif. In blastn output I get couple of hits with >90% (and alignment length = 50) So my questions are-

Why I'm not getting the hits below 90% identity (like sequences that has 85 or 70% identity to the motif seq)?
Is there other efficient way/tool available to find the sequences from the data that has my seq of interest in it?

I'm new to blast and bioinfo in general so any help/inputs would be highly appreciated!

Thanks very much!

blast • 2.5k views

ADD COMMENT • link updated 2.6 years ago by Ram 43k • written 10.0 years ago by ritujoshi18 • 0

0

Entering edit mode

For some weird reason, I've seen legacy BLAST's blastall work better on the shorter nucl sequences than blast+ 's blastn. Have you tried blastall?

Also, I think 70% of 50 nucl (30 nucl) might too short of a match for BLAST to consider, and you might wanna tweak parameters a bit.

ADD REPLY • link 10.0 years ago by Ram 43k

0

Entering edit mode

Thanks Ram for your reply. Haven't try the blastall yet but will try that now. And which parameters exactly I will have to change?

Few more Qs-

Btw, do you think if its a correct approach to blast long (in my case ~400-600bp) seqs against short read (40- 60 bp) to pull out the sequences from NGS data?
Would I miss the sequences that has mismatches in the string where my short sequence is aligned?

Again, thanks.

R

ADD REPLY • link updated 4.3 years ago by Ram 43k • written 10.0 years ago by ritujoshi18 • 0

score 2 · Answer 1 · 2014-04-24

1. You should probably search with a higher E-value threshold, this will give you more answers, you can (or should?) use a small word length (try 7) - this is the minimal perfect matching BLAST requires between the two for the alignment to be found.

2. I think this problem is a classical global alignment problem instead of local alignment (which BLAST does). You can use needle of the EMBOSS package to align the short sequence to the long ones, you should probably use a high gap penalty. The only issue here is that there is no statistics - you have to decide if the alignment returned is satisfactory or not,