Short Blast with Gaps
1
1
Entering edit mode
7.2 years ago
corvusvordan ▴ 10

Hello,

I am trying to do blastn using NCBI-BLAST BLAST 2.2.31+ using the following 2 following sequences in a file:

>PositiveControl
GGTTGAGGCTAAGCCAG
>PositiveControl_NNNNN
GGTTGANNNNNAGCCAG

against the following sequence:

>SubjectPositiveControl
ACCTCTGCATTAGGACTCTGCATCGACCGTAGCCAGTTCTTAGGCGGTTGAGGCTAAGCCAGGTAGGCCAAGTCTCACTGGAGCCGGTTGCGAGGGATTCCTTCGGGCTGAGGGAGAGTTTATGTGGAGTGGTCTGAAGGTGGTT

My >Positive Control is just copy of a part of >SubjectPositiveControl, and >PositiveControl_NNNNN is same as >PositiveControl but with 5 letters changed to Ns. My blast search finds >PositiveControl but aligns >PositiveControl_NNNNN only partially, where there are exact sequences (eg GGTTGA). Here is the command I used:

 blastn -query PositiveControl.txt -subject SubjectPositiveControl.txt -task "blastn-short" -max_target_seqs 1 -word_size 4 -outfmt "6 qseqid sseqid qlen length qseq sseq pident sstrand" > PositiveControlResults.txt

Is there any way to perform this search with BLAST? Or is there any other tool/software/website that can do that? Thank you.

sequence alignment blast short • 3.7k views
ADD COMMENT
0
Entering edit mode

What is your expected alignment ? Can you also post contents of PositiveControlResults.txt ?

ADD REPLY
0
Entering edit mode

BLAST+ is currently in v.2.6. Please upgrade if possible.

ADD REPLY
0
Entering edit mode

seqkit locate -d(http://bioinf.shenwei.me/seqkit/usage/#locate) can locate motif containing N, but it's exact not similar search.

ADD REPLY
2
Entering edit mode
7.2 years ago

The reason that BLAST aligns your sequence GGTTGANNNNNAGCCAG partially is that it uses (by default) high penalty for nucleotide mismatch (penalty argument). You can tweak the input paramaters (e.g., gapopen and penalty) to get better-looking alignments.

For example, you can try to execute this command:

 blastn -query PositiveControl.txt -subject SubjectPositiveControl.txt -task "blastn-short" -max_target_seqs 1 -word_size 4 -outfmt "6 qseqid sseqid qlen length qseq sseq pident sstrand" -gapopen 10 -penalty -1 -dust no -soft_masking false

It should give you what you are looking for:

PositiveControl SubjectPositiveControl  17  17  GGTTGAGGCTAAGCCAG   GGTTGAGGCTAAGCCAG   100.00  plus
PositiveControl_NNNNN   SubjectPositiveControl  17  17  GGTTGANNNNNAGCCAG   GGTTGAGGCTAAGCCAG   70.59   plus

Note, there are other tools that were specifically created for the task of aligning short nucleotide sequences to the longer reference sequences (e.g. BWA, Bowtie2).

ADD COMMENT

Login before adding your answer.

Traffic: 1540 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6